M.I.M.I.R - Multi-agent Intelligent Memory & Insight Repository

Overview Schema Related Servers Score Discussions

Mimir
docs
bugfixes

DUPLICATE_EDGES_FIX.md•7.84 KiB

# Duplicate Edges Issue - Root Cause & Fix **Date**: November 20, 2025 **Issue**: File chunk edges are duplicated when folders are re-indexed **Impact**: Database bloat, potential query performance degradation **Status**: ✅ Fixed --- ## Problem When removing and re-adding folders to Mimir's file indexing system, **duplicate edges were created for file chunks**. Each re-index would create an additional set of edges instead of cleaning up old edges first, leading to database bloat. ### Example ``` First index: FileChunk-1 --[NEXT_CHUNK]--> FileChunk-2 Second index: FileChunk-1 --[NEXT_CHUNK]--> FileChunk-2 (duplicate!) Third index: FileChunk-1 --[NEXT_CHUNK]--> FileChunk-2 (another duplicate!) ``` ### Root Causes 1. **Missing `DETACH DELETE`** - Old code used `DELETE` instead of `DETACH DELETE` - `DELETE` removes nodes but leaves orphaned edges - `DETACH DELETE` removes edges THEN nodes 2. **Old embedding structure** - Query looked for `[:HAS_EMBEDDING]` relationships that no longer exist - Embeddings are now stored as node properties, not separate nodes 3. **Path matching bug** - Used simple `STARTS WITH` without path separator - Could cause false matches (e.g., `/src` matching `/src-other`) 4. **Missing NodeChunk cleanup** - Only handled `FileChunk`, not `NodeChunk` nodes - New universal chunking system wasn't being cleaned up --- ## Detected Statistics **Scan Results** (from `db:cleanup-edges:dry-run`): - **Total duplicate edge sets**: 2,926 - **Primary edge type**: `NEXT_CHUNK` between FileChunk nodes - **Secondary duplicates**: `depends_on` between todo nodes - **Estimated duplicate edges to remove**: ~2,926+ **Breakdown by type**: - `NEXT_CHUNK`: ~2,900 duplicates - `depends_on`: ~26 duplicates - Others: Negligible --- ## Solution ### 1. Fixed `DELETE /api/indexed-folders` Endpoint **File**: `src/api/index-api.ts` **Changes**: ```typescript // OLD (problematic) MATCH (f:File) WHERE f.path STARTS WITH $folderPath OPTIONAL MATCH (f)-[:HAS_CHUNK]->(c:FileChunk) OPTIONAL MATCH (c)-[:HAS_EMBEDDING]->(e) // ❌ Wrong structure DETACH DELETE f, c, e // NEW (fixed) // Ensure path ends with separator to avoid false matches const folderPathWithSep = path.endsWith('/') ? path : path + '/'; // Delete File nodes and their FileChunk children MATCH (f:File) WHERE f.path STARTS WITH $folderPathWithSep OR f.path = $exactPath OPTIONAL MATCH (f)-[:HAS_CHUNK]->(c:FileChunk) DETACH DELETE f, c RETURN count(DISTINCT f) as fileCount, count(DISTINCT c) as chunkCount ``` **Key improvements**: - ✅ Removed obsolete `[:HAS_EMBEDDING]` relationship matching - ✅ Added path separator to prevent false matches - ✅ Added exact path match option - ✅ Returns deletion stats for logging - ✅ Properly uses `DETACH DELETE` ### 2. Fixed Path Translation Functions **File**: `src/api/index-api.ts` **Functions updated**: - `translateToHostPath()` helper - File migration path calculation **Changes**: ```typescript // Ensure root ends with separator to avoid false matches const rootWithSep = containerWorkspaceRoot.endsWith('/') ? containerWorkspaceRoot : `${containerWorkspaceRoot}/`; // Check if path starts with root (with separator) or is exact match if (containerPath.startsWith(rootWithSep) || containerPath === containerWorkspaceRoot) { return containerPath.replace(containerWorkspaceRoot, hostWorkspaceRoot); } ``` ### 3. Created Cleanup Script **File**: `scripts/cleanup-duplicate-edges.js` **Features**: - Scans entire database for duplicate edges - Shows detailed statistics by relationship type - Supports `--dry-run` mode for safe preview - 5-second confirmation before destructive operations - Verification after cleanup - Returns summary of deleted edges **Usage**: ```bash # Preview what would be deleted (safe) npm run db:cleanup-edges:dry-run # Actually perform cleanup (destructive) npm run db:cleanup-edges ``` **Cypher query used**: ```cypher MATCH (source)-[r]->(target) WITH source, target, type(r) as relType, collect(r) as rels WHERE size(rels) > 1 WITH source, target, relType, rels UNWIND rels[1..] as duplicateRel // Keep first edge, delete rest DELETE duplicateRel RETURN count(*) as deletedCount ``` --- ## Prevention The fix prevents future duplicates by: 1. **Proper cleanup on removal** - `DETACH DELETE` ensures all edges are removed 2. **Accurate path matching** - Path separator prevents false matches 3. **Modern schema** - No longer looks for obsolete embedding relationships 4. **Comprehensive cleanup** - Handles both FileChunk and NodeChunk nodes --- ## Deployment Steps ### 1. Backup Database (Recommended) ```bash # Export Neo4j database before cleanup docker exec neo4j_db neo4j-admin database dump neo4j --to-path=/backups ``` ### 2. Preview Duplicates ```bash npm run db:cleanup-edges:dry-run ``` ### 3. Run Cleanup ```bash npm run db:cleanup-edges ``` ### 4. Rebuild & Restart Mimir ```bash npm run build docker compose build mimir-server docker compose up -d mimir-server ``` ### 5. Verify - Remove a folder from indexing - Re-add the same folder - Check Neo4j Browser for duplicate `NEXT_CHUNK` edges - Should see NO duplicates --- ## Verification Query Run this in Neo4j Browser to check for remaining duplicates: ```cypher // Find duplicate edges MATCH (source)-[r]->(target) WITH source, target, type(r) as relType, collect(r) as rels WHERE size(rels) > 1 RETURN labels(source) as sourceLabels, labels(target) as targetLabels, relType, size(rels) as duplicateCount ORDER BY duplicateCount DESC LIMIT 20 ``` **Expected result after fix**: 0 rows --- ## Performance Impact **Before fix**: - Database size: Growing with each re-index - Query performance: Degrading over time - Duplicate edges: Accumulating indefinitely **After fix**: - Database size: Stable - Query performance: Consistent - Duplicate edges: None (cleaned up) --- ## Related Files ### Modified - `src/api/index-api.ts` - Fixed deletion endpoint and path translation - `package.json` - Added `db:cleanup-edges` scripts ### Created - `scripts/cleanup-duplicate-edges.js` - Database cleanup utility - `docs/bugfixes/DUPLICATE_EDGES_FIX.md` - This document --- ## Additional Notes ### Why This Happened 1. **Legacy code** - Original implementation used simple `DELETE` 2. **Schema evolution** - Embeddings moved from nodes to properties 3. **Incremental development** - New features (NodeChunk) not fully integrated into cleanup ### Lessons Learned 1. ✅ Always use `DETACH DELETE` when removing nodes with relationships 2. ✅ Keep cleanup logic in sync with schema changes 3. ✅ Add path separators to prevent false `STARTS WITH` matches 4. ✅ Create verification queries for critical operations 5. ✅ Provide dry-run modes for destructive database operations --- ## Testing ### Manual Test ```bash # 1. Index a folder curl -X POST http://localhost:9042/api/index-folder \ -H "Content-Type: application/json" \ -d '{"path": "/workspace/test", "hostPath": "/Users/test"}' # 2. Wait for indexing to complete # 3. Remove the folder curl -X DELETE http://localhost:9042/api/indexed-folders \ -H "Content-Type: application/json" \ -d '{"path": "/workspace/test"}' # 4. Re-index the same folder curl -X POST http://localhost:9042/api/index-folder \ -H "Content-Type: application/json" \ -d '{"path": "/workspace/test", "hostPath": "/Users/test"}' # 5. Check for duplicates in Neo4j Browser MATCH (c1:FileChunk)-[r:NEXT_CHUNK]->(c2:FileChunk) WHERE c1.path STARTS WITH '/workspace/test' WITH c1, c2, collect(r) as rels WHERE size(rels) > 1 RETURN c1.id, c2.id, size(rels) ``` **Expected**: 0 rows (no duplicates) --- ## Support If you encounter issues: 1. Check Neo4j logs: `docker compose logs neo4j_db` 2. Run diagnostic: `npm run db:cleanup-edges:dry-run` 3. Verify API is running: `curl http://localhost:9042/health` 4. Check this document for verification queries --- **Status**: ✅ Issue resolved, prevention measures in place, cleanup script available

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/orneryd/Mimir'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

DUPLICATE_EDGES_FIX.md•7.84 KiB