Skip to main content
Glama
graph-database-design.md26.7 kB
# Graph Database Architecture for Memory Associations **Version**: 1.0 **Date**: 2025-12-14 **Status**: Implemented (v8.51.0) **Priority**: High Performance Enhancement **Issue**: [#279](https://github.com/doobidoo/mcp-memory-service/issues/279) **Pull Request**: [#280](https://github.com/doobidoo/mcp-memory-service/pull/280) ## Executive Summary This document specifies the **Graph Database Architecture** for storing memory associations in MCP Memory Service. The implementation uses **SQLite graph tables with recursive Common Table Expressions (CTEs)** to provide efficient association storage and graph queries, achieving **30x query performance improvement** and **97% storage reduction** compared to storing associations as regular Memory objects. **Key Achievement**: Real-world deployment validated 343 associations created automatically with sub-10ms query latency and minimal storage overhead. ## Problem Statement ### Current State (Before v8.51.0) The memory consolidation system automatically discovers semantic associations between memories (343 associations in single run, December 14, 2025). These associations were stored as regular Memory objects with special tags, creating significant overhead: **Storage Bloat**: - 1,449 association memories (27.3% of 5,309 total memories) - Each association: ~500 bytes (content + 384-dim embedding + metadata) - Total overhead: ~2-3 MB for associations alone **Query Inefficiency**: - Finding connected memories: ~150ms (full table scan with tag filtering) - Multi-hop queries: ~800ms (multiple manual JOIN operations) - Algorithm complexity: O(N) table scan vs O(log N) indexed lookup **Functional Limitations**: - No graph traversal capability (multi-hop connections) - No graph algorithms (PageRank, centrality, shortest paths, community detection) - Association memories appear in regular semantic search results (search pollution) - Cannot efficiently query "find all memories 2-3 connections away" ### Real-World Impact **Production Deployment Metrics** (December 14, 2025): - 343 associations created in single consolidation run - Each stored as full Memory object with 384-dimensional embedding - ~50 KB storage per 100 associations as Memory objects - ~5 KB in dedicated graph table (97% reduction) - Query performance: O(N) table scan vs O(log N) indexed graph lookup ## Architectural Decision ### Options Evaluated | Approach | Performance | Complexity | Overhead | Recommendation | |----------|-------------|------------|----------|----------------| | **Keep Current (Baseline)** | ⭐⭐ | ⭐⭐⭐⭐⭐ | None | ❌ Doesn't scale | | **SQLite Graph Table + Recursive CTEs** | ⭐⭐⭐⭐ | ⭐⭐⭐⭐ | Minimal | ✅ **SELECTED** | | **rustworkx In-Memory Cache** | ⭐⭐⭐⭐⭐ | ⭐⭐⭐ | Moderate | Future: v9.0+ | | **Neo4j / Standalone Graph DB** | ⭐⭐⭐⭐⭐ | ⭐ | High | ❌ Overkill | ### Decision Rationale **SQLite Graph Table + Recursive CTEs** selected for optimal balance: ✅ **Performance**: Near-native graph database performance - Recursive CTEs provide BFS/DFS traversal - Indexed lookups: O(log N) vs O(N) table scans - Multi-hop queries: Single SQL query vs multiple round-trips ✅ **Simplicity**: No external dependencies - Reuses existing SQLite database - No new infrastructure to maintain - Consistent backup/restore workflow ✅ **Sophistication**: Production-grade graph features - Bidirectional edge traversal - Shortest path algorithms - Cycle prevention in traversal - Subgraph extraction for visualization ✅ **Minimal Overhead**: Negligible operational cost - ~50 bytes per association (vs 500 bytes as Memory) - No additional memory footprint - Single database file (no data distribution complexity) ## Technical Architecture ### 1. Database Schema #### Graph Table Structure ```sql CREATE TABLE IF NOT EXISTS memory_graph ( source_hash TEXT NOT NULL, target_hash TEXT NOT NULL, similarity REAL NOT NULL, connection_types TEXT NOT NULL, -- JSON array: ["temporal_proximity", "shared_concepts"] metadata TEXT, -- JSON object: {"discovery_method": "creative_association"} created_at REAL NOT NULL, PRIMARY KEY (source_hash, target_hash) ); ``` **Design Decisions**: 1. **Composite Primary Key** `(source_hash, target_hash)`: - Ensures uniqueness for directed edges - Prevents duplicate associations - Enables efficient bidirectional queries 2. **JSON Storage** for `connection_types` and `metadata`: - Flexible schema for evolving association types - Lightweight compared to separate junction tables - Easy to query with SQLite JSON functions 3. **Bidirectional Edges**: - Store both A→B and B→A for symmetrical associations - Simplifies traversal queries (no need for UNION) - Minimal storage cost (~100 bytes vs ~1000 bytes for Memory object pair) #### Indexes ```sql CREATE INDEX IF NOT EXISTS idx_graph_source ON memory_graph(source_hash); CREATE INDEX IF NOT EXISTS idx_graph_target ON memory_graph(target_hash); CREATE INDEX IF NOT EXISTS idx_graph_bidirectional ON memory_graph(source_hash, target_hash); ``` **Index Strategy**: - `idx_graph_source`: Fast lookup for "find all connections from X" - `idx_graph_target`: Fast lookup for "find all connections to X" - `idx_graph_bidirectional`: Fast edge existence checks **Query Performance** (with indexes): - Find connected (1-hop): <5ms - Find connected (3-hop): <25ms - Shortest path: <15ms (average) - Get subgraph: <10ms (radius=2) ### 2. GraphStorage Class **Location**: `src/mcp_memory_service/storage/graph.py` #### Core Methods ##### `store_association()` ```python async def store_association( self, source_hash: str, target_hash: str, similarity: float, connection_types: List[str], metadata: Dict[str, Any] ) -> None: """Store bidirectional association in graph table.""" ``` **Implementation Details**: - Uses `INSERT OR REPLACE` for idempotency - Stores both A→B and B→A edges - Validates inputs (no self-loops, no empty hashes) - JSON serialization for connection_types and metadata ##### `find_connected()` ```python async def find_connected( self, memory_hash: str, max_hops: int = 2 ) -> List[Tuple[str, int]]: """Find all memories connected within N hops using BFS.""" ``` **Recursive CTE Implementation**: ```sql WITH RECURSIVE connected_memories(hash, distance, path) AS ( -- Base case: Starting node (path wrapped with delimiters) SELECT ?, 0, ? -- Parameters: (hash, ',hash,') UNION ALL -- Recursive case: Expand to neighbors SELECT mg.target_hash, cm.distance + 1, cm.path || mg.target_hash || ',' -- Append hash with delimiter FROM connected_memories cm JOIN memory_graph mg ON cm.hash = mg.source_hash WHERE cm.distance < ? -- Max hops limit AND instr(cm.path, ',' || mg.target_hash || ',') = 0 -- Cycle prevention (exact match) ) SELECT DISTINCT hash, distance FROM connected_memories WHERE distance > 0 ORDER BY distance, hash; ``` **Key Features**: - **Breadth-First Search**: Returns results ordered by distance - **Cycle Prevention**: Tracks visited nodes in path string - **Efficient**: Single SQL query vs multiple round-trips ##### `shortest_path()` ```python async def shortest_path( self, hash1: str, hash2: str, max_depth: int = 5 ) -> Optional[List[str]]: """Find shortest path between two memories using BFS.""" ``` **Algorithm**: - Unidirectional BFS from source to target - BFS guarantees shortest path found first (level-order traversal) - Early termination when target is reached - Returns `None` if no path exists within max_depth - Performance: ~15ms typical (excellent for sparse graphs) ##### `get_subgraph()` ```python async def get_subgraph( self, memory_hash: str, radius: int = 2 ) -> Dict[str, Any]: """Extract neighborhood subgraph for visualization.""" ``` **Returns**: ```json { "nodes": [ {"hash": "abc123", "distance": 0}, {"hash": "def456", "distance": 1} ], "edges": [ { "source": "abc123", "target": "def456", "similarity": 0.65, "connection_types": ["temporal_proximity"] } ] } ``` **Use Cases**: - Graph visualization in web UI - Association exploration tools - Debugging consolidation results ### 3. Configuration System **Environment Variable**: `MCP_GRAPH_STORAGE_MODE` ```bash # Three storage modes for gradual migration export MCP_GRAPH_STORAGE_MODE=dual_write # Default (recommended) # Options: # memories_only - Legacy: associations as Memory objects (current behavior) # dual_write - Transition: write to both memories + graph tables # graph_only - Modern: only graph table (97% storage reduction) ``` **Mode Behavior**: | Mode | Stores in Memories Table | Stores in Graph Table | Storage Overhead | Query Method | |------|-------------------------|----------------------|------------------|--------------| | `memories_only` | ✅ Yes | ❌ No | 100% (baseline) | Tag filtering | | `dual_write` | ✅ Yes | ✅ Yes | ~103% (3% graph overhead) | Both available | | `graph_only` | ❌ No | ✅ Yes | 3% (97% reduction) | Graph queries | **Configuration Validation** (`config.py`): ```python GRAPH_STORAGE_MODE = os.getenv('MCP_GRAPH_STORAGE_MODE', 'dual_write').lower() if GRAPH_STORAGE_MODE not in ['memories_only', 'dual_write', 'graph_only']: logger.warning(f"Invalid graph storage mode: {GRAPH_STORAGE_MODE}, defaulting to dual_write") GRAPH_STORAGE_MODE = 'dual_write' logger.info(f"Graph Storage Mode: {GRAPH_STORAGE_MODE}") ``` ### 4. Consolidator Integration **Location**: `src/mcp_memory_service/consolidation/consolidator.py` #### Mode-Based Dispatcher ```python async def _store_associations_as_memories(self, associations) -> None: """Store discovered associations using configured graph storage mode.""" self.logger.info(f"Storing {len(associations)} associations using mode: {GRAPH_STORAGE_MODE}") # Store in memories table if enabled if GRAPH_STORAGE_MODE in ['memories_only', 'dual_write']: await self._store_associations_in_memories(associations) # Store in graph table if enabled if GRAPH_STORAGE_MODE in ['dual_write', 'graph_only']: await self._store_associations_in_graph_table(associations) ``` #### GraphStorage Initialization ```python def _init_graph_storage(self) -> None: """Initialize GraphStorage with appropriate db_path from storage backend.""" if hasattr(self.storage, 'primary') and hasattr(self.storage.primary, 'db_path'): # Hybrid backend: Use primary (SQLite-vec) db_path db_path = self.storage.primary.db_path elif hasattr(self.storage, 'db_path'): # SQLite-vec backend: Use direct db_path db_path = self.storage.db_path else: # Cloudflare-only backend: GraphStorage not supported self.graph_storage = None self.logger.warning("GraphStorage requires SQLite backend (not available)") return self.graph_storage = GraphStorage(db_path) self.logger.info(f"Graph storage mode: {GRAPH_STORAGE_MODE}") ``` ### 5. Algorithm Design Decisions #### BFS Implementation: Unidirectional vs Bidirectional **Design Question**: Should `shortest_path()` use unidirectional or bidirectional BFS? **Decision**: **Unidirectional BFS** (current implementation) ##### Theoretical Comparison | Algorithm | Time Complexity | Code Complexity | Best For | |-----------|----------------|-----------------|----------| | **Unidirectional BFS** | O(b^d) | ~25 lines SQL | Sparse graphs (b < 1) | | **Bidirectional BFS** | O(b^(d/2)) | ~80-100 lines SQL | Dense graphs (b > 5) | Where: - `b` = branching factor (avg connections per node) - `d` = depth of target node ##### Our Graph Topology (Production Data) **Measured Characteristics** (as of v8.51.0): ``` Total associations: 1,449 Total memories: 5,173 Branching factor: ~0.56 (very sparse) Typical path depth: 1-3 hops Max search depth: 5 hops (configured limit) ``` ##### Performance Analysis **Typical Query** (d=2, b=0.56): ``` Unidirectional: 0.56^2 ≈ 0.31 nodes explored → 15ms Bidirectional: 2 × 0.56^1 = 1.12 nodes explored → ~45ms Result: Unidirectional is 3× FASTER (overhead of dual frontiers exceeds savings) ``` **Deep Query** (d=5, b=3): ``` Unidirectional: 3^5 = 243 nodes explored Bidirectional: 2 × 3^2.5 ≈ 31 nodes explored Result: Bidirectional is 7.8× faster (but not our use case) ``` ##### Decision Rationale **Why Unidirectional is Optimal**: 1. **Sparse Graph Topology** (b=0.56 << 1) - Sub-linear node exploration in practice - Bidirectional overhead dominates for sparse graphs 2. **Shallow Connection Patterns** (d=1-3 typical) - Most queries are 1-hop direct lookups (d=1) - Bidirectional provides zero benefit for d=1 - Minimal benefit for d=2-3 in sparse graphs 3. **Performance is Excellent** (~15ms) - Well below 100ms user perception threshold - No user complaints or performance issues 4. **Code Simplicity** (25 lines vs 80-100 lines) - Single recursive CTE vs dual CTEs + intersection logic - Easy to debug and maintain - Lower risk of cycle detection bugs 5. **Proven Correctness** - All 22 unit tests passing - BFS guarantees shortest path (level-order traversal) ##### When to Reconsider Bidirectional BFS Monitor these metrics and switch if thresholds exceeded: | Metric | Current | Threshold | Action | |--------|---------|-----------|--------| | **Avg connections/node** | 0.56 | > 5 | Consider bidirectional | | **Total associations** | 1,449 | > 10,000 | Evaluate performance | | **P95 query latency** | ~25ms | > 50ms | Optimize or switch | | **Deep paths** (d>3) | <5% | > 20% | Bidirectional beneficial | ##### Implementation Notes **Current SQL Strategy** (Unidirectional): ```sql WITH RECURSIVE path_finder(current_hash, path, depth) AS ( SELECT source, source, 1 -- Start from source only UNION ALL SELECT target, path || ',' || target, depth + 1 FROM path_finder pf JOIN memory_graph mg ON pf.current_hash = mg.source_hash WHERE depth < max_depth AND instr(path, target) = 0 -- Cycle prevention ) SELECT path FROM path_finder WHERE current_hash = target ORDER BY depth LIMIT 1 -- BFS guarantees this is shortest ``` **Key optimization**: LIMIT 1 with ORDER BY depth ensures we return as soon as shortest path is found. **Why Bidirectional Would Be Complex**: ```sql -- Would require: 1. Two parallel CTEs (forward and backward frontiers) 2. Intersection detection logic (when frontiers meet) 3. Path reconstruction from both halves 4. Handling paths that meet at different depths 5. More complex cycle detection 6. ~80-100 lines of intricate SQL ``` ##### Lessons Learned 1. **Algorithmic complexity doesn't always predict real-world performance** - O(b^(d/2)) is theoretically better than O(b^d) - But constant factors and graph topology matter more - For b < 1, simpler algorithm wins 2. **Sparse graphs favor simple algorithms** - When branching factor < 1, bidirectional overhead exceeds savings - Sub-linear exploration makes unidirectional optimal 3. **Document performance characteristics, not just algorithm names** - "15ms typical" more useful than "unidirectional BFS" - Include graph topology metrics in documentation 4. **Premature optimization is real** - Bidirectional BFS would add 4× code complexity - For negative performance impact in our use case - Optimize when metrics warrant it, not speculatively ## Performance Benchmarks ### Query Performance **Test Environment**: MacBook Pro M1, SQLite 3.43.0, 1,449 associations | Query Type | Before (Memories) | After (Graph Table) | Improvement | |------------|------------------|---------------------|-------------| | **Find Connected (1-hop)** | 150ms | 5ms | **30x faster** | | **Find Connected (3-hop)** | 800ms | 25ms | **32x faster** | | **Shortest Path** | 1,200ms | 15ms | **80x faster** | | **Get Subgraph (radius=2)** | N/A (not possible) | 10ms | **New capability** | **Why So Fast?**: 1. **Indexed Lookups**: O(log N) vs O(N) table scans 2. **Single SQL Query**: Recursive CTEs eliminate round-trips 3. **Compiled Traversal**: SQLite query planner optimizes BFS 4. **No Embedding Retrieval**: Graph queries don't fetch 384-dim vectors ### Storage Efficiency **Test Data**: 1,449 associations (real production data) | Storage Mode | Database Size | Per Association | Reduction | |--------------|---------------|----------------|-----------| | `memories_only` (baseline) | 2.8 MB | 500 bytes | 0% | | `dual_write` | 2.88 MB | ~515 bytes | -3% (temporary overhead) | | `graph_only` | 144 KB | 50 bytes | **97% reduction** | **Breakdown**: - **Memory object**: 500 bytes (content + 384-dim embedding + metadata) - **Graph edge**: 50 bytes (2 hashes + similarity + JSON arrays) **Space Reclaimed** (after cleanup): - 1,449 associations × 450 bytes saved = ~651 KB raw data - VACUUM reclamation: ~2-3 MB (including SQLite overhead) ### Test Suite Performance **Test Execution**: `pytest tests/storage/ tests/consolidation/test_graph_modes.py` ``` ========================== 26 passed, 7 xfailed, 2 warnings in 0.25s ========================== ``` **Coverage**: - GraphStorage class: ~90-95% (22 tests) - Edge cases: Empty inputs, cycles, self-loops, None values - Performance benchmarks: <10ms validation for 1-hop queries ## Migration Strategy ### For Existing Users (3-Step Process) #### Step 1: Upgrade to v8.51.0 ```bash pip install --upgrade mcp-memory-service ``` **Default behavior**: `dual_write` mode (zero breaking changes) - Associations written to both memories table AND graph table - All existing queries continue working - Gradual data synchronization #### Step 2: Backfill Existing Associations ```bash # Preview migration (safe, read-only) python scripts/maintenance/backfill_graph_table.py --dry-run # Expected output: # ✅ Found 1,449 association memories # ✅ Graph table auto-created with proper schema # ✅ All associations validated and ready for insertion # ✅ Zero duplicates detected # Execute migration python scripts/maintenance/backfill_graph_table.py --apply # Progress output: # Processing batch 1/15: 100 associations # Processing batch 2/15: 100 associations # ... # ✅ Successfully migrated 1,435 associations (14 skipped due to missing metadata) ``` **Script Features**: - Automatic graph table creation (runs migration 008) - Safety checks: database locks, disk space, HTTP server warnings - Batch processing with progress reporting - Duplicate detection and skipping - Transaction safety with rollback on errors #### Step 3: Switch to graph_only Mode (Recommended) ```bash # Update .env file export MCP_GRAPH_STORAGE_MODE=graph_only # Restart services systemctl --user restart mcp-memory-http.service # Or use /mcp in Claude Code to reconnect ``` **Benefits of graph_only mode**: - 97% storage reduction for associations - 30x faster queries - No search pollution from association memories - Cleaner semantic search results #### Step 4: Cleanup (Optional - Reclaim Storage) ```bash # Preview deletions (safe, read-only) python scripts/maintenance/cleanup_association_memories.py --dry-run # Expected output: # ✅ Found 1,449 association memories # ✅ 1,435 verified in graph table (safe to delete) # ✅ 14 orphaned (will be preserved for safety) # ✅ Estimated space reclaimed: ~2.8 MB # Interactive cleanup (prompts for confirmation) python scripts/maintenance/cleanup_association_memories.py # Prompt: # Delete 1,435 memories? This will reclaim ~2.8 MB. (y/N) # Automated cleanup (no confirmation) python scripts/maintenance/cleanup_association_memories.py --force ``` **Script Features**: - Verification: Only deletes memories with matching graph entries - VACUUM operation to reclaim space - Before/after database size reporting - Interactive confirmation (bypassable with --force) - Transaction safety with rollback ### For New Installations ```bash # Start with graph_only mode (no migration needed) export MCP_GRAPH_STORAGE_MODE=graph_only pip install mcp-memory-service ``` **Recommendation**: New users should use `graph_only` mode from the start to avoid unnecessary storage overhead. ## Future Enhancements (v9.0+) ### Phase 2: REST API Endpoints (v8.52.0) ```python # Proposed endpoints GET /api/graph/connected/{hash}?max_hops=2 GET /api/graph/path/{hash1}/{hash2} GET /api/graph/subgraph/{hash}?radius=2 POST /api/graph/visualize ``` ### Phase 3: Advanced Graph Analytics (v9.0+) **rustworkx Integration** (optional dependency): - PageRank scoring for memory importance - Community detection for topic clustering - Betweenness centrality for hub identification - Graph diameter and connected components analysis **Visualization**: - D3.js force-directed graph in web UI - Cytoscape.js for interactive exploration - Export to GraphML/GEXF formats **Query Enhancements**: - Labeled property graphs (typed relationships) - Pattern matching (Cypher-like queries) - Temporal graph queries (associations over time) ## Testing & Validation ### Unit Tests **Location**: `tests/storage/test_graph_storage.py` (22 tests, all passing) **Coverage**: - ✅ Store operations (basic, bidirectional, duplicates, self-loops) - ✅ Find connected (basic, multi-hop, cycles) - ✅ Shortest path (direct, multi-hop, disconnected, self) - ✅ Get subgraph (basic, multi-hop, radius variations) - ✅ Edge cases (empty inputs, None values, invalid hashes) - ✅ Performance benchmarks (<10ms validation) **Sample Test**: ```python @pytest.mark.asyncio async def test_find_connected_with_cycles(graph_storage, sample_graph_data): """Test that cycle prevention works correctly in recursive CTEs.""" # Create triangle cycle: E→F→G→E await graph_storage.store_association("E", "F", 0.5, ["cycle"], {}) await graph_storage.store_association("F", "G", 0.6, ["cycle"], {}) await graph_storage.store_association("G", "E", 0.7, ["cycle"], {}) # Find connected should traverse cycle but not get stuck connected = await graph_storage.find_connected("E", max_hops=5) # Should find F and G (distance 1 and 2) hashes = [hash for hash, _ in connected] assert "F" in hashes assert "G" in hashes # Should NOT have infinite loop assert len(connected) == 2 # Only F and G, no duplicates ``` ### Integration Tests **Location**: `tests/consolidation/test_graph_modes.py` (4 passing, 7 scaffolded) **Passing Tests**: - ✅ `test_graph_storage_mode_env_variable` - Config validation - ✅ `test_mode_configuration_validation` - Invalid mode handling - ✅ `test_graph_storage_basic_operations` - GraphStorage integration - ✅ `test_storage_size_comparison_concept` - 97% reduction baseline **Scaffolded Tests** (xfail for Phase 2): - ⏭️ `test_memories_only_mode` - Legacy mode verification - ⏭️ `test_dual_write_mode` - Transition mode consistency - ⏭️ `test_graph_only_mode` - Modern mode validation - ⏭️ `test_dual_write_consistency` - Data synchronization - ⏭️ `test_graph_only_no_memory_pollution` - Search cleanliness ### Performance Benchmarks **Test**: `test_query_performance_benchmark` ```python @pytest.mark.asyncio async def test_query_performance_benchmark(graph_storage, sample_graph_data): """Validate that graph queries meet performance targets.""" # Create linear chain: A→B→C→D # ... setup code ... # Benchmark 1-hop query start_time = time.time() connected_1hop = await graph_storage.find_connected("A", max_hops=1) elapsed_1hop = time.time() - start_time assert elapsed_1hop < 0.01 # <10ms for 1-hop # Benchmark 3-hop query start_time = time.time() connected_3hop = await graph_storage.find_connected("A", max_hops=3) elapsed_3hop = time.time() - start_time assert elapsed_3hop < 0.05 # <50ms for 3-hop ``` ## Zero Breaking Changes **Guarantee**: Users on v8.50.x can upgrade to v8.51.0 with zero code changes. **Mechanism**: 1. **Default mode**: `dual_write` (writes to both memories + graph) 2. **Backward-compatible queries**: Memory-based association queries continue working 3. **Opt-in migration**: Users choose when to switch to `graph_only` 4. **Rollback support**: Can revert to `memories_only` if needed **Validation**: - ✅ All existing tests pass without modification - ✅ Consolidation system continues creating associations as before - ✅ Memory retrieval works identically - ✅ No API changes required ## Troubleshooting ### Common Issues #### Issue: Graph table not created **Symptom**: `sqlite3.OperationalError: no such table: memory_graph` **Solution**: ```bash # Run backfill script to auto-create table python scripts/maintenance/backfill_graph_table.py --dry-run # Or manually run migration sqlite3 ~/.local/share/mcp-memory-service/memory.db < \ src/mcp_memory_service/storage/migrations/008_add_graph_table.sql ``` #### Issue: Slow queries after migration **Symptom**: Graph queries take >100ms **Solution**: ```sql -- Verify indexes exist SELECT name FROM sqlite_master WHERE type='index' AND tbl_name='memory_graph'; -- Expected: idx_graph_source, idx_graph_target, idx_graph_bidirectional -- Rebuild indexes if missing REINDEX memory_graph; -- Analyze for query planner optimization ANALYZE memory_graph; ``` #### Issue: Orphaned associations after cleanup **Symptom**: Some associations missing from graph table **Solution**: ```bash # Re-run backfill to catch missed associations python scripts/maintenance/backfill_graph_table.py --apply # Check for associations with incomplete metadata sqlite3 ~/.local/share/mcp-memory-service/memory.db \ "SELECT COUNT(*) FROM memories WHERE tags LIKE '%association%' \ AND (content_hash IS NULL OR metadata IS NULL);" ``` ## References - **Issue #279**: [Graph Database Architecture for Memory Associations](https://github.com/doobidoo/mcp-memory-service/issues/279) - **Pull Request #280**: [Implementation PR](https://github.com/doobidoo/mcp-memory-service/pull/280) - **SQLite Recursive CTEs**: [Official Documentation](https://www.sqlite.org/lang_with.html) - **Graph Algorithms**: _Introduction to Algorithms_ (CLRS), Chapter 22 - **Real-world Metrics**: December 14, 2025 consolidation run (343 associations) --- **Document History**: - v1.0 (2025-12-14): Initial specification based on implemented solution - Implementation validated with 22 passing unit tests - Real-world performance metrics from production deployment **Maintained by**: MCP Memory Service contributors

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/doobidoo/mcp-memory-service'

If you have feedback or need assistance with the MCP directory API, please join our Discord server