---
phase: 04-retrieval-nodes
plan: "03"
type: execute
wave: 3
depends_on: ["04-01", "04-02"]
files_modified:
- src/skill_retriever/nodes/retrieval/score_fusion.py
- src/skill_retriever/nodes/retrieval/context_assembler.py
- tests/test_score_fusion.py
- tests/test_context_assembler.py
- src/skill_retriever/nodes/retrieval/__init__.py
autonomous: true
must_haves:
truths:
- "RRF fuses vector and graph results without score normalization"
- "Type filter is applied AFTER score fusion (not during retrieval)"
- "Context assembler respects token budget and priority ordering by component type"
artifacts:
- path: "src/skill_retriever/nodes/retrieval/score_fusion.py"
provides: "RRF score fusion and optional reranking"
exports: ["reciprocal_rank_fusion", "fuse_retrieval_results"]
- path: "src/skill_retriever/nodes/retrieval/context_assembler.py"
provides: "Token-budgeted context assembly with type priority"
exports: ["assemble_context", "RetrievalContext"]
key_links:
- from: "score_fusion.py"
to: "models.py"
via: "imports RankedComponent"
pattern: "from.*models import.*RankedComponent"
- from: "context_assembler.py"
to: "graph_store.py"
via: "get_node for component metadata"
pattern: "graph_store\\.get_node"
- from: "context_assembler.py"
to: "ComponentType"
via: "type priority ordering"
pattern: "ComponentType"
---
<objective>
Create the score fusion and context assembler nodes for Phase 4.
Purpose: Combine vector and graph retrieval results using RRF, apply type filtering, and assemble token-budgeted context for downstream consumption.
Output:
- `score_fusion.py` with RRF implementation and result fusion
- `context_assembler.py` with token budgeting and priority ordering
- Test suites for both modules
</objective>
<execution_context>
@C:\Users\33641\.claude/get-shit-done/workflows/execute-plan.md
@C:\Users\33641\.claude/get-shit-done/templates/summary.md
</execution_context>
<context>
@.planning/PROJECT.md
@.planning/ROADMAP.md
@.planning/STATE.md
@.planning/phases/04-retrieval-nodes/04-RESEARCH.md
@.planning/phases/04-retrieval-nodes/04-01-PLAN.md
@.planning/phases/04-retrieval-nodes/04-02-PLAN.md
@src/skill_retriever/memory/graph_store.py
@src/skill_retriever/entities/components.py
</context>
<tasks>
<task type="auto">
<name>Task 1: Create RRF score fusion</name>
<files>
src/skill_retriever/nodes/retrieval/score_fusion.py
tests/test_score_fusion.py
</files>
<action>
**score_fusion.py:**
Implement Reciprocal Rank Fusion (RRF) from research:
**Constants:**
```python
RRF_K = 60 # Empirically validated default from Elasticsearch/Milvus
```
**Functions:**
`reciprocal_rank_fusion(ranked_lists: list[list[str]], k: int = 60) -> list[tuple[str, float]]`
- For each ranked list, compute 1/(k + rank) score for each item (rank starts at 1)
- Sum scores across all lists for each item
- Return sorted list of (item_id, rrf_score) tuples, descending by score
`fuse_retrieval_results(vector_results: list[RankedComponent], graph_results: dict[str, float], graph_store: GraphStore, component_type: ComponentType | None = None, top_k: int = 10) -> list[RankedComponent]`
- Extract ranked list from vector_results (list of component_ids in rank order)
- Extract ranked list from graph_results (sort by score descending, extract IDs)
- Call reciprocal_rank_fusion with both lists
- If component_type is not None, filter results by type using graph_store.get_node
- Convert to RankedComponent list with source="fused"
- Return top_k results
**Key design decisions:**
- Type filter applied AFTER fusion (not during retrieval) per research pitfall #3
- RRF uses k=60 per research recommendation
- Fused results get source="fused" to distinguish from single-source results
**tests/test_score_fusion.py:**
- Test reciprocal_rank_fusion with two lists, verify top item appears first
- Test reciprocal_rank_fusion handles items appearing in only one list
- Test reciprocal_rank_fusion with k=1 vs k=60 shows different score distributions
- Test fuse_retrieval_results combines vector and graph results
- Test fuse_retrieval_results applies type filter after fusion
- Test fuse_retrieval_results returns RankedComponent with source="fused"
- Test fuse_retrieval_results respects top_k limit
- Test fuse_retrieval_results handles empty graph results (vector-only fallback)
</action>
<verify>
```bash
uv run pytest tests/test_score_fusion.py -v
uv run pyright src/skill_retriever/nodes/retrieval/score_fusion.py
uv run ruff check src/skill_retriever/nodes/retrieval/
```
All tests pass, pyright 0 errors, ruff 0 errors.
</verify>
<done>
RRF fuses vector and graph results correctly, with type filtering applied post-fusion.
</done>
</task>
<task type="auto">
<name>Task 2: Create context assembler with token budgeting</name>
<files>
src/skill_retriever/nodes/retrieval/context_assembler.py
tests/test_context_assembler.py
src/skill_retriever/nodes/retrieval/__init__.py
</files>
<action>
**context_assembler.py:**
Port token budgeting from z-commands context-assembler.js:
**Constants:**
```python
# Approximate token estimation: 4 chars = 1 token (conservative)
CHARS_PER_TOKEN = 4
DEFAULT_TOKEN_BUDGET = 2000
# Priority ordering for component types (higher priority = include first)
TYPE_PRIORITY: dict[ComponentType, int] = {
ComponentType.AGENT: 1,
ComponentType.SKILL: 2,
ComponentType.COMMAND: 3,
ComponentType.MCP: 4,
ComponentType.HOOK: 5,
ComponentType.SETTING: 6,
ComponentType.SANDBOX: 7,
}
```
**Models:**
```python
@dataclass
class RetrievalContext:
components: list[RankedComponent]
total_tokens: int
truncated: bool
excluded_count: int
```
**Functions:**
`estimate_tokens(text: str) -> int`
- Return len(text) // CHARS_PER_TOKEN
`get_component_content(component_id: str, graph_store: GraphStore) -> str`
- Get node from graph_store
- Return node.label (name) as minimal content for now
- In future phases, this will pull full ComponentMetadata.raw_content
`assemble_context(ranked_components: list[RankedComponent], graph_store: GraphStore, token_budget: int = 2000) -> RetrievalContext`
- Sort components by: (1) type priority, (2) score descending within same type
- Iterate through sorted components
- For each: estimate tokens of component content
- If adding would exceed budget: mark truncated, skip remaining
- Track included components and total tokens
- Return RetrievalContext with included components, token count, truncation flag, excluded count
**tests/test_context_assembler.py:**
- Test estimate_tokens returns approximately len/4
- Test assemble_context respects token budget
- Test assemble_context prioritizes agents over skills over commands
- Test assemble_context returns truncated=True when budget exceeded
- Test assemble_context tracks excluded_count correctly
- Test assemble_context includes all if within budget (truncated=False)
- Test assemble_context handles empty input
**Update __init__.py:**
Add exports: reciprocal_rank_fusion, fuse_retrieval_results, assemble_context, RetrievalContext
Ensure all retrieval node exports are available from `skill_retriever.nodes.retrieval`.
</action>
<verify>
```bash
uv run pytest tests/test_context_assembler.py -v
uv run pyright src/skill_retriever/nodes/retrieval/context_assembler.py
uv run ruff check src/skill_retriever/nodes/retrieval/
# Verify all exports
uv run python -c "from skill_retriever.nodes.retrieval import QueryComplexity, RetrievalPlan, RankedComponent, plan_retrieval, extract_query_entities, search_by_text, search_with_type_filter, run_ppr_retrieval, compute_adaptive_alpha, flow_based_pruning, RetrievalPath, reciprocal_rank_fusion, fuse_retrieval_results, assemble_context, RetrievalContext; print('All exports OK')"
```
All tests pass, pyright 0 errors, ruff 0 errors.
</verify>
<done>
Context assembler respects token budget, prioritizes by component type, and tracks truncation state.
</done>
</task>
</tasks>
<verification>
```bash
# Full test suite for Phase 4 retrieval nodes
uv run pytest tests/test_query_planner.py tests/test_vector_search.py tests/test_ppr_engine.py tests/test_flow_pruner.py tests/test_score_fusion.py tests/test_context_assembler.py -v
# Type checking entire retrieval module
uv run pyright src/skill_retriever/nodes/retrieval/
# Linting
uv run ruff check src/skill_retriever/nodes/retrieval/
# Full project verification
uv run pytest
uv run pyright
uv run ruff check
```
</verification>
<success_criteria>
1. RRF with k=60 produces fused rankings that differ from single-source rankings
2. Type filter applied after fusion preserves correct ranking order
3. Context assembler respects 2000-token default budget
4. Agents are prioritized over skills in context assembly
5. RetrievalContext tracks truncation and excluded count accurately
6. All pyright and ruff checks pass
7. Full test suite passes (all plans combined)
</success_criteria>
<output>
After completion, create `.planning/phases/04-retrieval-nodes/04-03-SUMMARY.md`
</output>