---
phase: 04-retrieval-nodes
plan: "02"
type: execute
wave: 2
depends_on: ["04-01"]
files_modified:
- src/skill_retriever/nodes/retrieval/ppr_engine.py
- src/skill_retriever/nodes/retrieval/flow_pruner.py
- tests/test_ppr_engine.py
- tests/test_flow_pruner.py
- src/skill_retriever/nodes/retrieval/__init__.py
autonomous: true
must_haves:
truths:
- "PPR engine adapts alpha based on query specificity (high alpha for specific, low for broad)"
- "PPR returns empty dict when no seeds found (falls back to vector-only)"
- "Flow pruning reduces subgraph size by at least 40% while retaining top-ranked components"
artifacts:
- path: "src/skill_retriever/nodes/retrieval/ppr_engine.py"
provides: "Adaptive PPR with seed extraction"
exports: ["run_ppr_retrieval", "compute_adaptive_alpha"]
- path: "src/skill_retriever/nodes/retrieval/flow_pruner.py"
provides: "PathRAG-style flow-based pruning"
exports: ["flow_based_pruning", "RetrievalPath"]
key_links:
- from: "ppr_engine.py"
to: "NetworkXGraphStore.personalized_pagerank"
via: "graph_store.personalized_pagerank call"
pattern: "personalized_pagerank"
- from: "ppr_engine.py"
to: "query_planner.py"
via: "extract_query_entities for seed extraction"
pattern: "extract_query_entities"
- from: "flow_pruner.py"
to: "networkx"
via: "BFS path finding"
pattern: "nx\\."
---
<objective>
Create the PPR engine with adaptive alpha and flow-based pruning node for Phase 4.
Purpose: Enable graph-based retrieval that adapts traversal depth based on query specificity and prunes low-reliability paths to reduce noise.
Output:
- `ppr_engine.py` with adaptive alpha computation and seed-based PPR execution
- `flow_pruner.py` with PathRAG-style flow propagation and path extraction
- Test suites for both modules
</objective>
<execution_context>
@C:\Users\33641\.claude/get-shit-done/workflows/execute-plan.md
@C:\Users\33641\.claude/get-shit-done/templates/summary.md
</execution_context>
<context>
@.planning/PROJECT.md
@.planning/ROADMAP.md
@.planning/STATE.md
@.planning/phases/04-retrieval-nodes/04-RESEARCH.md
@.planning/phases/04-retrieval-nodes/04-01-PLAN.md
@src/skill_retriever/memory/graph_store.py
@src/skill_retriever/nodes/retrieval/query_planner.py
</context>
<tasks>
<task type="auto">
<name>Task 1: Create PPR engine with adaptive alpha</name>
<files>
src/skill_retriever/nodes/retrieval/ppr_engine.py
tests/test_ppr_engine.py
</files>
<action>
**ppr_engine.py:**
Port from z-commands ppr.js and implement:
**Constants (from research):**
```python
PPR_CONFIG = {
"default_alpha": 0.85,
"specific_alpha": 0.9, # Named entity, narrow query
"broad_alpha": 0.6, # Many entities, exploratory
"default_top_k": 50,
"min_score": 0.001,
}
```
**Functions:**
`compute_adaptive_alpha(query: str, seed_count: int) -> float`
- If query has named entity pattern (capitalized word like "Agent" or "GSD") AND seed_count <= 3: return 0.9
- If seed_count > 5: return 0.6 (broad query, explore further)
- Otherwise: return 0.85 (default)
- Use regex `r'\b[A-Z][a-z]+[A-Z]?\w*\b'` to detect camelCase/PascalCase component names
`run_ppr_retrieval(query: str, graph_store: GraphStore, alpha: float | None = None, top_k: int = 50) -> dict[str, float]`
- Call `extract_query_entities(query, graph_store)` to get seed IDs
- If no seeds found: return empty dict (caller should fall back to vector-only)
- If alpha is None: compute via `compute_adaptive_alpha(query, len(seeds))`
- Call `graph_store.personalized_pagerank(seed_ids=list(seeds), alpha=alpha, top_k=top_k)`
- Filter results with score < min_score (0.001)
- Return dict mapping node_id -> score
**tests/test_ppr_engine.py:**
- Test compute_adaptive_alpha returns 0.9 for "Find GSD command" with 2 seeds
- Test compute_adaptive_alpha returns 0.6 for broad query with 7 seeds
- Test compute_adaptive_alpha returns 0.85 for medium query with 4 seeds
- Test run_ppr_retrieval returns empty dict when no seeds found
- Test run_ppr_retrieval returns scores for valid seeds (use graph fixture)
- Test run_ppr_retrieval filters out scores below min_score threshold
- Test run_ppr_retrieval uses provided alpha when not None
Create graph fixture with 10+ nodes and edges for testing PPR behavior.
</action>
<verify>
```bash
uv run pytest tests/test_ppr_engine.py -v
uv run pyright src/skill_retriever/nodes/retrieval/ppr_engine.py
uv run ruff check src/skill_retriever/nodes/retrieval/
```
All tests pass, pyright 0 errors, ruff 0 errors.
</verify>
<done>
PPR engine adapts alpha based on query characteristics and returns scored node dict from graph traversal.
</done>
</task>
<task type="auto">
<name>Task 2: Create flow-based pruner (PathRAG port)</name>
<files>
src/skill_retriever/nodes/retrieval/flow_pruner.py
tests/test_flow_pruner.py
src/skill_retriever/nodes/retrieval/__init__.py
</files>
<action>
**flow_pruner.py:**
Port flow-based pruning from z-commands flow-pruner.js and PathRAG paper:
**Constants (from research):**
```python
FLOW_CONFIG = {
"alpha": 0.85,
"threshold": 0.01,
"max_path_length": 4,
"max_paths": 10,
"max_endpoints": 8,
}
```
**Models:**
```python
@dataclass
class RetrievalPath:
nodes: list[str]
flow: float
reliability: float
```
**Functions:**
`compute_path_reliability(path: list[str], ppr_scores: dict[str, float]) -> float`
- Calculate average PPR score of nodes in path
- Return 0.0 for empty paths
`find_paths_between(source: str, target: str, graph: nx.DiGraph, max_length: int = 4) -> list[list[str]]`
- Use BFS to find all simple paths from source to target up to max_length
- Limit to first 5 paths found (prevent explosion)
- Return list of node ID lists
`flow_based_pruning(ppr_scores: dict[str, float], graph_store: GraphStore, threshold: float = 0.01, max_paths: int = 10, max_endpoints: int = 8) -> list[RetrievalPath]`
- Get top endpoints from ppr_scores (sorted by score, take max_endpoints)
- For each pair of endpoints (i, j where i < j):
- Find paths between them using find_paths_between
- For each path, compute reliability score
- Filter paths with reliability < threshold
- Collect all paths, sort by reliability descending
- Return top max_paths paths as RetrievalPath objects
- Track flow as product of edge weights along path (default 1.0 if missing)
**tests/test_flow_pruner.py:**
- Test compute_path_reliability returns average PPR score
- Test compute_path_reliability returns 0.0 for empty path
- Test find_paths_between finds simple paths up to max_length
- Test find_paths_between respects path length limit
- Test flow_based_pruning returns paths sorted by reliability
- Test flow_based_pruning respects max_paths limit
- Test flow_based_pruning filters low-reliability paths
- Test flow_based_pruning achieves 40%+ reduction vs raw PPR top-k (key success criterion)
Create graph fixture with multiple paths between nodes to test pruning behavior.
**Update __init__.py:**
Add exports: run_ppr_retrieval, compute_adaptive_alpha, flow_based_pruning, RetrievalPath
</action>
<verify>
```bash
uv run pytest tests/test_flow_pruner.py -v
uv run pyright src/skill_retriever/nodes/retrieval/flow_pruner.py
uv run ruff check src/skill_retriever/nodes/retrieval/
# Verify 40% reduction criterion
uv run pytest tests/test_flow_pruner.py -v -k "reduction"
```
All tests pass including 40% reduction test, pyright 0 errors, ruff 0 errors.
</verify>
<done>
Flow pruner extracts high-reliability paths from PPR output, achieving at least 40% size reduction while retaining top-ranked components.
</done>
</task>
</tasks>
<verification>
```bash
# Full test suite for PPR and flow pruning
uv run pytest tests/test_ppr_engine.py tests/test_flow_pruner.py -v
# Type checking
uv run pyright src/skill_retriever/nodes/retrieval/
# Linting
uv run ruff check src/skill_retriever/nodes/retrieval/
# Verify exports work
uv run python -c "from skill_retriever.nodes.retrieval import run_ppr_retrieval, flow_based_pruning, RetrievalPath; print('Imports OK')"
```
</verification>
<success_criteria>
1. Adaptive alpha returns 0.9 for specific queries (named entity, few seeds)
2. Adaptive alpha returns 0.6 for broad queries (many seeds)
3. PPR returns empty dict when no seeds found (enables vector-only fallback)
4. Flow pruning produces RetrievalPath objects with reliability scores
5. Flow pruning achieves 40%+ reduction in result count vs unpruned PPR output
6. All pyright and ruff checks pass
</success_criteria>
<output>
After completion, create `.planning/phases/04-retrieval-nodes/04-02-SUMMARY.md`
</output>