Skip to main content
Glama
danielsimonjr

Enhanced Knowledge Graph Memory Server

COMPRESSION.md25.3 kB
# Memory Compression Guide ## Intelligent Duplicate Detection and Entity Merging **Version:** 0.8.0 **Last Updated:** 2025-11-23 --- ## Table of Contents 1. [Overview](#overview) 2. [Core Concepts](#core-concepts) 3. [Understanding Similarity Scoring](#understanding-similarity-scoring) 4. [Getting Started](#getting-started) 5. [Tool Reference](#tool-reference) 6. [Common Use Cases](#common-use-cases) 7. [Best Practices](#best-practices) 8. [Advanced Patterns](#advanced-patterns) 9. [Troubleshooting](#troubleshooting) --- ## Overview Memory compression helps you maintain a clean, efficient knowledge graph by identifying and merging duplicate or highly similar entities. Over time, knowledge graphs can accumulate redundant entries that waste storage and complicate queries. ### Key Features - ✅ **Intelligent Similarity Scoring**: Multi-factor algorithm considers names, types, observations, and tags - ✅ **Configurable Thresholds**: Control sensitivity of duplicate detection (default 0.8) - ✅ **Safe Merging**: Preserves all unique information when combining entities - ✅ **Automated Compression**: One-command cleanup with dry-run preview - ✅ **Relation Consolidation**: Automatically redirects relations to merged entities ### When to Use Compression - **Data consolidation**: Multiple similar entities exist (e.g., "Project Alpha", "project-alpha", "Project-Alpha") - **Storage optimization**: Graph is growing too large with redundant data - **Quality improvement**: Automated cleanup after bulk imports - **Periodic maintenance**: Regular deduplication as part of memory hygiene --- ## Core Concepts ### What is a Duplicate? Duplicates are entities that represent the same real-world concept but have slightly different representations: ``` Entity 1: "John Smith" (person) - ["Software engineer at TechCorp"] Entity 2: "John smith" (person) - ["Engineer at TechCorp", "Lives in SF"] ``` These are likely duplicates because: - Similar names (case variation) - Same entity type - Overlapping observations ### Similarity Score The system calculates a **similarity score** (0.0 to 1.0) between entity pairs: - **1.0** = Identical entities - **0.8-0.99** = Very similar (likely duplicates) - **0.5-0.79** = Somewhat similar (review manually) - **0.0-0.49** = Different entities **Default threshold**: 0.8 (only entities with ≥80% similarity are considered duplicates) ### Merge Behavior When merging entities: 1. **Observations**: Combined and deduplicated (unique values only) 2. **Tags**: Combined and deduplicated (unique values only) 3. **Importance**: Highest value is preserved 4. **Timestamps**: Earliest `createdAt`, latest `lastModified` 5. **Relations**: All relations redirected to target entity 6. **Name**: Target entity name is kept (or most important entity) 7. **Type**: Target entity type is kept --- ## Understanding Similarity Scoring ### Multi-Factor Algorithm The similarity score is calculated using **4 weighted factors**: | Factor | Weight | Algorithm | Description | |--------|--------|-----------|-------------| | **Name** | 40% | Levenshtein Distance | Character-level edit distance | | **Type** | 20% | Exact Match | Entity type must match | | **Observations** | 30% | Jaccard Similarity | Overlap of observation sets | | **Tags** | 10% | Jaccard Similarity | Overlap of tag sets | ### Factor Details #### 1. Name Similarity (40% weight) Uses **Levenshtein Distance** to measure how many edits are needed to transform one name into another. ```javascript // Examples "Project Alpha" vs "project alpha" → 0.92 (case difference) "John Smith" vs "Jon Smith" → 0.91 (one character) "API Server" vs "API Service" → 0.73 (two characters) "Frontend" vs "Backend" → 0.22 (very different) ``` **Formula:** ``` nameSimilarity = 1 - (editDistance / maxLength) ``` #### 2. Type Similarity (20% weight) Simple exact match (case-insensitive): ```javascript // Examples "project" vs "project" → 1.0 (match) "project" vs "Project" → 1.0 (case-insensitive) "project" vs "task" → 0.0 (different) ``` **Score:** - Match: +0.2 - No match: +0.0 #### 3. Observation Similarity (30% weight) Uses **Jaccard Similarity** to measure overlap between observation sets: ``` Jaccard = |intersection| / |union| ``` **Example:** ```javascript Entity A: ["Software engineer", "Works at Google"] Entity B: ["Software engineer", "Lives in SF"] Intersection: {"Software engineer"} → 1 item Union: {"Software engineer", "Works at Google", "Lives in SF"} → 3 items Jaccard = 1/3 = 0.33 Weighted = 0.33 × 0.3 = 0.10 ``` #### 4. Tag Similarity (10% weight) Also uses **Jaccard Similarity** for tag overlap: **Example:** ```javascript Entity A: ["javascript", "frontend", "react"] Entity B: ["javascript", "frontend", "vue"] Intersection: {"javascript", "frontend"} → 2 items Union: {"javascript", "frontend", "react", "vue"} → 4 items Jaccard = 2/4 = 0.50 Weighted = 0.50 × 0.1 = 0.05 ``` ### Final Score Calculation ```javascript finalScore = (nameSimilarity × 0.4) + (typeMatch × 0.2) + (observationJaccard × 0.3) + (tagJaccard × 0.1) ``` **Example Calculation:** ``` Entity 1: "Project Alpha" (project) - ["Web app rewrite"] - [high-priority] Entity 2: "project alpha" (project) - ["Web app rewrite", "Q4 deadline"] - [high-priority, urgent] Name: 0.92 × 0.4 = 0.368 Type: 1.0 × 0.2 = 0.200 Observations: 0.50 × 0.3 = 0.150 Tags: 0.50 × 0.1 = 0.050 ───── Final Score: 0.768 (76.8% similar - below default threshold) ``` --- ## Getting Started ### Example 1: Finding Duplicates ```javascript // Find all duplicate pairs with default threshold (0.8) find_duplicates({}) // Returns: { "duplicates": [ { "entity1": "Project Alpha", "entity2": "project-alpha", "similarity": 0.89 }, { "entity1": "John Smith", "entity2": "John smith", "similarity": 0.95 } ] } // Find with custom threshold (more sensitive) find_duplicates({ threshold: 0.7 }) ``` ### Example 2: Manual Merging ```javascript // Merge specific entities merge_entities({ entityNames: ["Project Alpha", "project-alpha", "Project-Alpha"] }) // Returns merged entity: { "name": "Project Alpha", "entityType": "project", "observations": [ "Web application rewrite", "React frontend migration", "Q4 2025 deadline" ], "tags": ["high-priority", "frontend", "migration"], "importance": 8, "createdAt": "2025-09-15T10:00:00.000Z", "lastModified": "2025-11-23T14:30:00.000Z" } ``` ### Example 3: Automated Compression ```javascript // Preview compression (safe mode) compress_graph({ threshold: 0.8, dryRun: true }) // Returns preview: { "duplicatesFound": 5, "entitiesMerged": 0, // 0 because dry-run "observationsCompressed": 12, "relationsConsolidated": 8, "spaceFreed": 3420, "mergedEntities": [ { "kept": "Project Alpha", "merged": ["project-alpha", "Project-Alpha"] }, { "kept": "John Smith", "merged": ["John smith"] } ] } // Execute compression compress_graph({ threshold: 0.8, dryRun: false }) ``` --- ## Tool Reference ### `find_duplicates` Find similar entity pairs above threshold. **Parameters:** - `threshold` (optional, default 0.8): Similarity threshold (0.0-1.0) **Returns:** ```typescript { duplicates: Array<{ entity1: string; entity2: string; similarity: number; }> } ``` **Usage:** ```javascript // Default sensitivity (80%) find_duplicates({}) // High sensitivity (70%) find_duplicates({ threshold: 0.7 }) // Very strict (90%) find_duplicates({ threshold: 0.9 }) ``` **When to use:** - Manual review of duplicates before merging - Finding clusters of similar entities - Assessing data quality --- ### `merge_entities` Merge multiple entities into one target entity. **Parameters:** - `entityNames` (required): Array of entity names to merge - `targetName` (optional): Name of entity to keep (if not specified, uses first or highest importance) **Returns:** Merged entity object **Behavior:** 1. Combines all unique observations 2. Combines all unique tags 3. Keeps highest importance value 4. Keeps earliest `createdAt` 5. Updates `lastModified` to now 6. Redirects all relations to target 7. Removes duplicate entities **Usage:** ```javascript // Merge with auto-selected target merge_entities({ entityNames: ["Project Alpha", "project-alpha", "Project-Alpha"] }) // Merge with specific target merge_entities({ entityNames: ["Project Alpha", "project-alpha", "Project-Alpha"], targetName: "Project Alpha" }) ``` **Validation:** - All entities must exist - Must provide at least 2 entities - Target (if specified) must be in the entity list --- ### `compress_graph` Automated duplicate detection and merging. **Parameters:** - `threshold` (optional, default 0.8): Similarity threshold - `dryRun` (optional, default false): Preview mode (no changes) **Returns:** CompressionResult object **Usage:** ```javascript // Safe preview compress_graph({ threshold: 0.8, dryRun: true }) // Execute compression compress_graph({ threshold: 0.8, dryRun: false }) // Aggressive compression (review carefully!) compress_graph({ threshold: 0.7, dryRun: true }) ``` **Best practices:** - **Always** run with `dryRun: true` first - Review `mergedEntities` list before executing - Use higher thresholds (0.85-0.9) for initial cleanup - Lower thresholds (0.7-0.75) for manual review --- ## Common Use Cases ### Use Case 1: Post-Import Cleanup **Scenario:** You imported entities from multiple sources and have duplicates with different naming conventions. ```javascript // Step 1: Assess the damage const result = await find_duplicates({ threshold: 0.8 }); console.log(`Found ${result.duplicates.length} duplicate pairs`); // Step 2: Preview compression const preview = await compress_graph({ threshold: 0.8, dryRun: true }); console.log(`Would merge ${preview.entitiesMerged} entities`); console.log(`Would save ~${preview.spaceFreed} characters`); // Step 3: Review specific merges preview.mergedEntities.forEach(merge => { console.log(`Keep: ${merge.kept}`); console.log(`Merge: ${merge.merged.join(', ')}`); }); // Step 4: Execute if satisfied await compress_graph({ threshold: 0.8, dryRun: false }); ``` ### Use Case 2: Periodic Maintenance **Scenario:** Monthly cleanup to maintain graph quality. ```javascript // Monthly compression script async function monthlyCleanup() { // Conservative threshold for automated cleanup const result = await compress_graph({ threshold: 0.85, dryRun: false }); console.log(`Monthly Cleanup Report:`); console.log(`- Duplicates found: ${result.duplicatesFound}`); console.log(`- Entities merged: ${result.entitiesMerged}`); console.log(`- Space freed: ${result.spaceFreed} characters`); console.log(`- Relations consolidated: ${result.relationsConsolidated}`); // Archive the report await create_entities({ entities: [{ name: `Cleanup Report ${new Date().toISOString()}`, entityType: "maintenance-log", observations: [ `Merged ${result.entitiesMerged} duplicate entities`, `Saved ${result.spaceFreed} characters` ], tags: ["maintenance", "compression"], importance: 3 }] }); } ``` ### Use Case 3: Case-Insensitive Deduplication **Scenario:** Multiple entities with same name but different capitalization. ```javascript // Find case variants const entities = await read_graph(); const nameMap = new Map(); entities.entities.forEach(e => { const lowerName = e.name.toLowerCase(); if (!nameMap.has(lowerName)) { nameMap.set(lowerName, []); } nameMap.get(lowerName).push(e.name); }); // Merge case variants for (const [lowerName, variants] of nameMap) { if (variants.length > 1) { console.log(`Merging: ${variants.join(', ')}`); await merge_entities({ entityNames: variants }); } } ``` ### Use Case 4: Targeted Compression by Type **Scenario:** Compress only specific entity types (e.g., clean up "person" entities). ```javascript async function compressByType(entityType, threshold = 0.8) { // Get all entities of type const graph = await read_graph(); const typeEntities = graph.entities .filter(e => e.entityType === entityType) .map(e => e.name); console.log(`Checking ${typeEntities.length} "${entityType}" entities...`); // Find duplicates among this type const allDuplicates = await find_duplicates({ threshold }); // Filter to only this type const typeDuplicates = allDuplicates.duplicates.filter(d => typeEntities.includes(d.entity1) && typeEntities.includes(d.entity2) ); console.log(`Found ${typeDuplicates.length} duplicate pairs in "${entityType}"`); // Merge each duplicate pair const merged = new Set(); for (const dup of typeDuplicates) { if (!merged.has(dup.entity1) && !merged.has(dup.entity2)) { await merge_entities({ entityNames: [dup.entity1, dup.entity2] }); merged.add(dup.entity2); } } console.log(`Merged ${merged.size} duplicate "${entityType}" entities`); } // Usage await compressByType("person", 0.8); await compressByType("project", 0.85); ``` --- ## Best Practices ### 1. Always Preview First **Never** run compression without reviewing the preview: ```javascript ✅ GOOD: Safe workflow const preview = await compress_graph({ threshold: 0.8, dryRun: true }); // Review preview.mergedEntities if (looks_good) { await compress_graph({ threshold: 0.8, dryRun: false }); } ❌ BAD: Direct execution await compress_graph({ threshold: 0.7, dryRun: false }); // Risky! ``` ### 2. Start Conservative Use **higher thresholds** initially, then lower if needed: ```javascript // Recommended progression 1. Start: threshold 0.9 (very strict - obvious duplicates only) 2. Review: threshold 0.85 (strict - high confidence) 3. Careful: threshold 0.8 (balanced - default) 4. Manual: threshold 0.7 (sensitive - review each merge) ``` ### 3. Compression by Phases Break large compressions into manageable phases: ```javascript // Phase 1: Obvious duplicates (very high similarity) await compress_graph({ threshold: 0.9, dryRun: false }); // Phase 2: High confidence (after review) await compress_graph({ threshold: 0.85, dryRun: false }); // Phase 3: Manual review (identify borderline cases) const borderline = await find_duplicates({ threshold: 0.75 }); // Manually review and merge selectively ``` ### 4. Preserve Important Entities Ensure important entities are the merge target: ```javascript // When merging, specify the canonical entity as target merge_entities({ entityNames: ["Project Alpha", "project-alpha", "PROJ-ALPHA"], targetName: "Project Alpha" // Keep the official name }) ``` ### 5. Document Compression Activity Track compression history for auditing: ```javascript async function documentedCompression(threshold) { const before = await get_graph_stats(); const result = await compress_graph({ threshold, dryRun: false }); const after = await get_graph_stats(); await create_entities({ entities: [{ name: `Compression ${new Date().toISOString()}`, entityType: "compression-log", observations: [ `Threshold: ${threshold}`, `Before: ${before.totalEntities} entities`, `After: ${after.totalEntities} entities`, `Merged: ${result.entitiesMerged} entities`, `Space freed: ${result.spaceFreed} characters`, `Merged groups: ${result.mergedEntities.map(m => m.kept).join(', ')}` ], tags: ["compression", "maintenance", "audit"], importance: 5 }] }); return result; } ``` ### 6. Validate After Compression Check graph integrity after major compressions: ```javascript // Run validation const validation = await validate_graph(); if (validation.errors.length > 0) { console.error("Compression created errors:", validation.errors); // Consider restoring from backup } if (validation.warnings.length > 0) { console.warn("Compression warnings:", validation.warnings); } ``` --- ## Advanced Patterns ### Pattern 1: Similarity Clustering Group entities by similarity to find clusters of duplicates: ```javascript async function findSimilarityClusters(threshold = 0.8) { const duplicates = await find_duplicates({ threshold }); // Build adjacency map const adjacency = new Map(); duplicates.duplicates.forEach(({ entity1, entity2, similarity }) => { if (!adjacency.has(entity1)) adjacency.set(entity1, []); if (!adjacency.has(entity2)) adjacency.set(entity2, []); adjacency.get(entity1).push({ entity: entity2, similarity }); adjacency.get(entity2).push({ entity: entity1, similarity }); }); // Find connected components (clusters) const visited = new Set(); const clusters = []; function dfs(entity, cluster) { if (visited.has(entity)) return; visited.add(entity); cluster.push(entity); const neighbors = adjacency.get(entity) || []; neighbors.forEach(({ entity: neighbor }) => { dfs(neighbor, cluster); }); } adjacency.forEach((_, entity) => { if (!visited.has(entity)) { const cluster = []; dfs(entity, cluster); if (cluster.length > 1) { clusters.push(cluster); } } }); return clusters; } // Usage const clusters = await findSimilarityClusters(0.8); console.log(`Found ${clusters.length} clusters of similar entities`); clusters.forEach((cluster, i) => { console.log(`Cluster ${i + 1}: ${cluster.join(', ')}`); // Option to merge entire cluster // await merge_entities({ entityNames: cluster }); }); ``` ### Pattern 2: Selective Field Merging Custom merge logic for specific scenarios: ```javascript async function smartMerge(entities, strategy) { // Get full entity data const fullEntities = await open_nodes({ names: entities }); let targetName; if (strategy === "most-important") { targetName = fullEntities.reduce((max, e) => (e.importance || 0) > (max.importance || 0) ? e : max ).name; } else if (strategy === "newest") { targetName = fullEntities.reduce((newest, e) => new Date(e.lastModified || e.createdAt) > new Date(newest.lastModified || newest.createdAt) ? e : newest ).name; } else if (strategy === "oldest") { targetName = fullEntities.reduce((oldest, e) => new Date(e.createdAt || e.lastModified) < new Date(oldest.createdAt || oldest.lastModified) ? e : oldest ).name; } return merge_entities({ entityNames: entities, targetName }); } // Usage await smartMerge(["Entity A", "Entity B", "Entity C"], "most-important"); ``` ### Pattern 3: Incremental Compression Process large graphs incrementally to avoid memory issues: ```javascript async function incrementalCompression(threshold, batchSize = 50) { const graph = await read_graph(); const entities = graph.entities.map(e => e.name); let totalMerged = 0; // Process in batches for (let i = 0; i < entities.length; i += batchSize) { const batch = entities.slice(i, i + batchSize); console.log(`Processing batch ${Math.floor(i / batchSize) + 1}...`); // Find duplicates within batch const duplicates = await find_duplicates({ threshold }); // Filter to batch entities const batchDuplicates = duplicates.duplicates.filter(d => batch.includes(d.entity1) && batch.includes(d.entity2) ); // Merge batch duplicates const merged = new Set(); for (const dup of batchDuplicates) { if (!merged.has(dup.entity1) && !merged.has(dup.entity2)) { await merge_entities({ entityNames: [dup.entity1, dup.entity2] }); merged.add(dup.entity2); totalMerged++; } } } console.log(`Incremental compression complete: ${totalMerged} entities merged`); return totalMerged; } ``` --- ## Troubleshooting ### Issue: Too Many False Positives **Problem:** Compression merges entities that shouldn't be merged. **Solution:** Increase threshold for stricter matching. ```javascript // Too aggressive (merges too much) compress_graph({ threshold: 0.7 }) ❌ // More conservative compress_graph({ threshold: 0.85 }) ✅ ``` ### Issue: Missing Obvious Duplicates **Problem:** Entities that are clearly duplicates are not being detected. **Solution:** Lower threshold or check entity structure. ```javascript // Check similarity score manually const duplicates = await find_duplicates({ threshold: 0.6 }); duplicates.duplicates.forEach(d => { console.log(`${d.entity1} <-> ${d.entity2}: ${d.similarity}`); }); // If similarity is lower than expected, investigate: const entities = await open_nodes({ names: ["Entity A", "Entity B"] }); console.log("Entity A:", entities[0]); console.log("Entity B:", entities[1]); // Common reasons for low similarity: // - Different entity types (20% penalty) // - Few overlapping observations // - Missing tags // - Very different names ``` ### Issue: Important Data Lost During Merge **Problem:** Merge removed observations or tags that should have been kept. **Solution:** This shouldn't happen! Merging preserves all unique data. Verify: ```javascript // Before merge const before = await open_nodes({ names: ["Entity A", "Entity B"] }); console.log("Before:", before); // After merge const merged = await merge_entities({ entityNames: ["Entity A", "Entity B"] }); console.log("After:", merged); // Check that all observations and tags are present // If data is missing, this is a bug - report it! ``` ### Issue: Relations Not Redirected **Problem:** After merging, relations still point to deleted entities. **Solution:** Validate graph integrity. ```javascript // Check for orphaned relations const validation = await validate_graph(); validation.errors.forEach(error => { if (error.includes("orphaned relation")) { console.error("Found orphaned relation:", error); } }); // This should not happen - merging automatically redirects relations // If orphaned relations exist, this is a bug ``` ### Issue: Compression is Too Slow **Problem:** `compress_graph` takes too long on large graphs. **Solution:** Use incremental compression or target specific types. ```javascript // Option 1: Incremental processing await incrementalCompression(0.8, 50); // Process 50 entities at a time // Option 2: Compress by type await compressByType("project", 0.8); await compressByType("task", 0.8); await compressByType("person", 0.8); ``` --- ## Integration Examples ### With Search ```javascript // Find duplicates in search results const searchResults = await search_nodes({ query: "project" }); const resultNames = searchResults.map(e => e.name); const duplicates = await find_duplicates({ threshold: 0.8 }); const searchDuplicates = duplicates.duplicates.filter(d => resultNames.includes(d.entity1) && resultNames.includes(d.entity2) ); console.log(`Found ${searchDuplicates.length} duplicates in search results`); ``` ### With Hierarchies ```javascript // Compress entities within a specific subtree const subtree = await get_subtree({ entityName: "Project Alpha" }); const subtreeNames = subtree.entities.map(e => e.name); // Find duplicates within subtree const allDuplicates = await find_duplicates({ threshold: 0.8 }); const subtreeDuplicates = allDuplicates.duplicates.filter(d => subtreeNames.includes(d.entity1) && subtreeNames.includes(d.entity2) ); // Merge subtree duplicates for (const dup of subtreeDuplicates) { await merge_entities({ entityNames: [dup.entity1, dup.entity2] }); } ``` ### With Tags ```javascript // Compress entities with specific tag const graph = await read_graph(); const taggedEntities = graph.entities .filter(e => e.tags?.includes("needs-cleanup")) .map(e => e.name); // Aggressive compression for cleanup-tagged entities const duplicates = await find_duplicates({ threshold: 0.75 }); const taggedDuplicates = duplicates.duplicates.filter(d => taggedEntities.includes(d.entity1) && taggedEntities.includes(d.entity2) ); for (const dup of taggedDuplicates) { await merge_entities({ entityNames: [dup.entity1, dup.entity2] }); console.log(`Cleaned up: ${dup.entity2} → ${dup.entity1}`); } ``` --- ## Summary Memory compression provides intelligent duplicate detection and merging: ✅ **Intelligent**: Multi-factor similarity algorithm with 4 weighted factors ✅ **Safe**: Dry-run preview mode prevents accidental data loss ✅ **Flexible**: Configurable thresholds for different use cases ✅ **Preserving**: All unique observations, tags, and relations are kept ✅ **Automated**: One-command compression for routine maintenance **Recommended Workflow:** 1. Run `find_duplicates` to assess duplicate count 2. Preview with `compress_graph({ threshold: 0.8, dryRun: true })` 3. Review `mergedEntities` list carefully 4. Execute with `compress_graph({ threshold: 0.8, dryRun: false })` 5. Validate with `validate_graph` **Next Steps:** - Read [HIERARCHY_GUIDE.md](HIERARCHY_GUIDE.md) for organizing entities - Read [ARCHIVING_GUIDE.md](ARCHIVING_GUIDE.md) for memory lifecycle management - See [API Reference](README.md#api-reference) for complete tool documentation

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/danielsimonjr/memory-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server