---
phase: 03-memory-layer
plan: 01
type: execute
wave: 1
depends_on: []
files_modified:
- src/skill_retriever/memory/__init__.py
- src/skill_retriever/memory/graph_store.py
- tests/test_graph_store.py
- pyproject.toml
autonomous: true
must_haves:
truths:
- "Components and relationships are stored as nodes and directed edges in the graph store"
- "PPR computation from a seed node returns ranked nodes within 200ms for a 1300-node graph"
- "Graph store abstraction layer allows swapping NetworkX backend without changing calling code"
artifacts:
- path: "src/skill_retriever/memory/graph_store.py"
provides: "GraphStore Protocol + NetworkXGraphStore implementation"
exports: ["GraphStore", "NetworkXGraphStore"]
min_lines: 120
- path: "tests/test_graph_store.py"
provides: "Graph store unit tests"
min_lines: 80
- path: "pyproject.toml"
provides: "networkx dependency added"
contains: "networkx"
key_links:
- from: "src/skill_retriever/memory/graph_store.py"
to: "src/skill_retriever/entities/graph.py"
via: "imports GraphNode, GraphEdge, EdgeType"
pattern: "from skill_retriever\\.entities\\.graph import"
- from: "src/skill_retriever/memory/graph_store.py"
to: "networkx"
via: "nx.DiGraph and nx.pagerank"
pattern: "nx\\.pagerank"
---
<objective>
Build the graph store subsystem: a Protocol-based abstraction (`GraphStore`) with a NetworkX DiGraph implementation (`NetworkXGraphStore`) that stores component nodes and relationship edges, computes Personalized PageRank, and persists to JSON.
Purpose: The graph store is the backbone of the retrieval system. Components and their dependency/enhancement/conflict relationships form a directed graph that PPR traverses to find relevant component sets. The Protocol abstraction ensures the NetworkX backend can be swapped for FalkorDB/KuzuDB later without changing any calling code.
Output: `graph_store.py` with Protocol + implementation, tests, networkx dependency added to pyproject.toml
</objective>
<execution_context>
@C:\Users\33641\.claude/get-shit-done/workflows/execute-plan.md
@C:\Users\33641\.claude/get-shit-done/templates/summary.md
</execution_context>
<context>
@.planning/PROJECT.md
@.planning/ROADMAP.md
@.planning/STATE.md
@.planning/phases/03-memory-layer/03-RESEARCH.md
@src/skill_retriever/entities/graph.py
@src/skill_retriever/entities/components.py
@src/skill_retriever/config.py
</context>
<tasks>
<task type="auto">
<name>Task 1: Add networkx dependency and create GraphStore Protocol + NetworkXGraphStore</name>
<files>
pyproject.toml
src/skill_retriever/memory/__init__.py
src/skill_retriever/memory/graph_store.py
</files>
<action>
1. Add `networkx>=3.6.1` to pyproject.toml dependencies array. Run `uv sync` to install.
2. Create `src/skill_retriever/memory/graph_store.py` with:
a) `GraphStore` Protocol (runtime_checkable):
- `add_node(node: GraphNode) -> None`
- `add_edge(edge: GraphEdge) -> None`
- `get_node(node_id: str) -> GraphNode | None`
- `get_neighbors(node_id: str) -> list[GraphNode]`
- `get_edges(node_id: str) -> list[GraphEdge]`
- `personalized_pagerank(seed_ids: list[str], alpha: float = 0.85, top_k: int = 10) -> list[tuple[str, float]]`
- `node_count() -> int`
- `edge_count() -> int`
- `save(path: str) -> None`
- `load(path: str) -> None`
b) `NetworkXGraphStore` class implementing the Protocol:
- Internal `nx.DiGraph()` stores nodes with attrs: component_type (str value), label, embedding_id
- Internal graph stores edges with attrs: edge_type (str value), weight, plus any metadata dict items
- `get_node()` reconstructs `GraphNode` Pydantic model on-demand from NX node data (do NOT store Pydantic objects in NX)
- `get_edges()` reconstructs `GraphEdge` Pydantic models on-demand from NX edge data
- `get_neighbors()` uses `nx.DiGraph.successors()` and `nx.DiGraph.predecessors()` combined to get both in/out neighbors
- `personalized_pagerank()` uses `nx.pagerank(self._graph, alpha=alpha, personalization=personalization, weight="weight")`. The personalization dict MUST cover ALL nodes (initialize all to 0.0, then set seed nodes to 1/len(seeds)). Exclude seed nodes from results. Return top_k sorted descending by score. Handle empty graph and empty seed_ids by returning [].
- `save()` uses `nx.node_link_data(self._graph)` + `json.dumps()` to write JSON
- `load()` uses `json.loads()` + `nx.node_link_graph(data, directed=True)` -- MUST pass `directed=True` explicitly (see research pitfall 6)
Use `from __future__ import annotations` at top. Import entities with `# noqa: TC001` where needed for Pydantic runtime compatibility, following project conventions from Phase 2.
3. Update `src/skill_retriever/memory/__init__.py` to export `GraphStore` and `NetworkXGraphStore`.
AVOID:
- Do NOT store Pydantic model instances in NetworkX nodes/edges -- store primitive dicts, reconstruct on read (research recommendation)
- Do NOT use pickle for persistence (security, not debuggable)
- Do NOT use `nx.pagerank_scipy` or `nx.pagerank_numpy` -- they were removed in NX 3.0
- Do NOT convert to scipy sparse for PPR at this scale
</action>
<verify>
Run `uv run python -c "from skill_retriever.memory import GraphStore, NetworkXGraphStore; print('imports OK')"` succeeds.
Run `uv run python -c "from skill_retriever.memory.graph_store import GraphStore, NetworkXGraphStore; assert isinstance(NetworkXGraphStore(), GraphStore); print('protocol OK')"` succeeds.
Run `uv run pyright src/skill_retriever/memory/graph_store.py` passes with zero errors.
Run `uv run ruff check src/skill_retriever/memory/graph_store.py` passes.
</verify>
<done>
GraphStore Protocol exists with all 10 methods. NetworkXGraphStore satisfies the Protocol (isinstance check passes). NetworkX is installed. All type checks and linting pass.
</done>
</task>
<task type="auto">
<name>Task 2: Write graph store tests covering CRUD, PPR, and persistence</name>
<files>
tests/test_graph_store.py
</files>
<action>
Create `tests/test_graph_store.py` with pytest tests:
1. **test_add_and_get_node** -- Add a GraphNode, retrieve by ID, verify all fields match. Also verify get_node returns None for nonexistent ID.
2. **test_add_and_get_edges** -- Add 2 nodes and an edge between them. Call get_edges() on source node. Verify edge fields (source_id, target_id, edge_type, weight) match.
3. **test_get_neighbors** -- Add 3 nodes: A -> B -> C (DEPENDS_ON edges). Verify get_neighbors(B) returns both A and C (in+out neighbors).
4. **test_node_and_edge_counts** -- Add nodes and edges, verify node_count() and edge_count() return correct numbers.
5. **test_personalized_pagerank_basic** -- Build a small graph: A -> B -> C -> D with DEPENDS_ON edges. Run PPR with seed=[A], alpha=0.85, top_k=3. Verify returns list of (str, float) tuples. Verify B is ranked higher than D (closer to seed). Verify A (seed) is excluded from results.
6. **test_ppr_empty_graph** -- Empty graph, run PPR, verify returns [].
7. **test_ppr_empty_seeds** -- Graph with nodes, empty seed list, verify returns [].
8. **test_save_and_load** -- Add nodes and edges. Save to tmp_path JSON file. Create new NetworkXGraphStore, load from file. Verify node_count, edge_count match. Verify get_node returns correct data. Verify edges are preserved with correct types.
9. **test_ppr_performance** -- Build a graph with 1300 nodes (use a loop creating nodes and random edges). Run PPR and assert it completes in under 200ms (use `time.perf_counter()`). Use `pytest.mark.slow` marker.
10. **test_protocol_isinstance** -- Verify `isinstance(NetworkXGraphStore(), GraphStore)` is True.
Use fixtures from entities:
```python
from skill_retriever.entities.components import ComponentType
from skill_retriever.entities.graph import EdgeType, GraphEdge, GraphNode
```
For the 1300-node performance test, create nodes with IDs like `"test/repo/agent/node-{i}"` and connect each node to 2-3 random neighbors to create a realistic graph density.
</action>
<verify>
Run `uv run pytest tests/test_graph_store.py -v` -- all tests pass.
Run `uv run pytest tests/test_graph_store.py -v -m slow` -- performance test passes (PPR under 200ms).
Run `uv run ruff check tests/test_graph_store.py` passes.
</verify>
<done>
10 tests covering add/get nodes, add/get edges, neighbors, counts, PPR (basic + empty + performance), save/load round-trip, and Protocol conformance all pass. PPR runs under 200ms for 1300-node graph.
</done>
</task>
</tasks>
<verification>
- `uv run pytest tests/test_graph_store.py -v` -- all tests green
- `uv run pyright src/skill_retriever/memory/` -- zero errors
- `uv run ruff check src/skill_retriever/memory/` -- zero warnings
- `isinstance(NetworkXGraphStore(), GraphStore)` returns True
- PPR completes under 200ms for 1300-node graph
</verification>
<success_criteria>
1. GraphStore Protocol defines the complete abstraction interface (10 methods)
2. NetworkXGraphStore satisfies the Protocol at runtime
3. Nodes and edges round-trip through add/get correctly
4. PPR returns ranked results excluding seeds, with closer nodes ranked higher
5. Save/load preserves all graph data including edge types and weights
6. PPR runs under 200ms for a 1300-node graph
7. All linting and type checks pass
</success_criteria>
<output>
After completion, create `.planning/phases/03-memory-layer/03-01-SUMMARY.md`
</output>