Skip to main content
Glama
pruning-implementation.md7.96 kB
# Smart Memory Pruning Implementation This document describes the implementation of Step 6: Smart Pruning with Information-Gain in the Titan Memory Architecture. ## Overview The smart pruning system implements an information-gain based approach to intelligently reduce memory usage while maintaining high-quality memories. The system uses a composite scoring function that combines entropy, surprise, and redundancy measurements to determine which memories to keep. ## Architecture ### Core Components 1. **MemoryPruner Class** (`src/pruning.ts`) - Main pruning engine with configurable parameters - Implements information-gain scoring algorithm - Handles memory distillation and validation 2. **MCP Integration** (`src/index.ts`) - `prune_memory` tool endpoint for manual pruning - Automatic pruning based on capacity thresholds - Statistics reporting and monitoring 3. **Model Integration** (`src/model.ts`) - `pruneMemoryByInformationGain()` method - Automatic pruning checks during memory operations - Configuration and statistics access ## Scoring Algorithm The pruning system uses a composite score for each memory: ``` Score = (entropy × entropy_weight) + (surprise × surprise_weight) - (redundancy × redundancy_weight) ``` ### Components #### 1. Entropy Calculation - Converts memory vectors to probability distributions using softmax - Calculates Shannon entropy: `-∑(p * log(p))` - Higher entropy indicates more information content #### 2. Surprise Measurement - Uses stored surprise history from memory updates - Represents how unexpected the memory was when first stored - Higher surprise indicates more valuable memories #### 3. Redundancy Detection - Computes pairwise cosine similarity between all memories - Takes maximum similarity with other memories as redundancy score - Higher redundancy indicates less unique information ## Configuration The system supports extensive configuration through the `PruningConfig` interface: ```typescript interface PruningConfig { keepPercentage: number; // 0.7 = keep 70% of memories minMemoriesToKeep: number; // Never prune below this threshold maxCapacity: number; // Trigger automatic pruning at this size entropyWeight: number; // Weight for entropy in scoring (1.0) surpriseWeight: number; // Weight for surprise in scoring (1.2) redundancyWeight: number; // Weight for redundancy penalty (0.8) redundancyThreshold: number; // Similarity threshold for redundancy (0.85) enableDistillation: boolean; // Whether to distill pruned memories (true) } ``` ## Memory Distillation When enabled, pruned memories are not simply discarded but are distilled into long-term memory: 1. **Weighted Averaging**: Pruned memories are combined using access counts as weights 2. **Long-term Integration**: Distilled vectors are added to long-term memory 3. **Capacity Management**: Long-term memory is truncated if it exceeds 30% of total capacity ## Triggers Pruning can be triggered in multiple ways: ### 1. Automatic Capacity-Based - Triggers when memory usage exceeds `maxCapacity` - Checked during memory store operations - Maintains system performance and memory limits ### 2. Manual MCP Command - `prune_memory` tool with optional threshold parameter - Allows fine-grained control over pruning intensity - Returns detailed statistics about the pruning operation ### 3. Programmatic API - `pruneMemoryByInformationGain(threshold?)` method - Direct integration with other system components - Supports custom thresholds and configurations ## Validation and Quality Assurance The system includes comprehensive validation: ### State Validation - Checks tensor shape consistency after pruning - Validates that all tensors have matching dimensions - Detects NaN and infinite values in memory state ### Quality Analysis - Compares entropy and surprise metrics before/after pruning - Calculates improvement ratios for quality assessment - Provides feedback on pruning effectiveness ## Performance Characteristics ### Time Complexity - Information scoring: O(n²) for redundancy calculation - Sorting and selection: O(n log n) - Memory creation: O(n × d) where d is embedding dimension ### Memory Usage - Temporary tensors for calculations are managed with `tf.tidy()` - Pruned memories are properly disposed to prevent memory leaks - Configurable long-term memory limits prevent unbounded growth ## Statistics and Monitoring The system provides comprehensive statistics: ```typescript interface PruningStats { totalPrunings: number; // Total number of pruning operations averageReduction: number; // Average reduction ratio lastPruningTime: number; // Timestamp of last pruning timeSinceLastPruning: number; // Time since last pruning shouldPrune: boolean; // Whether pruning is recommended currentMemorySize: number; // Current active memory count maxCapacity: number; // Maximum capacity threshold } ``` ## Testing The implementation includes comprehensive unit tests covering: - Basic pruning operations and logic - Information gain calculations (entropy, surprise, redundancy) - State validation and error handling - Configuration management and statistics - Edge cases (empty states, single memories, extreme configurations) - Performance testing with large memory states ### Test Coverage - 26 test cases covering all major functionality - Tests for both normal operation and edge cases - Performance benchmarks for large-scale pruning - Memory quality improvement validation ## Usage Examples ### Basic Pruning ```typescript const pruner = new MemoryPruner({ keepPercentage: 0.8, enableDistillation: true }); const result = await pruner.pruneMemory(memoryState); console.log(`Reduced from ${result.originalCount} to ${result.finalCount} memories`); ``` ### MCP Command ```bash # Prune memory keeping 60% of memories prune_memory --threshold 0.6 # Force pruning regardless of current capacity prune_memory --force true ``` ### Model Integration ```typescript // Automatic pruning when needed const result = await model.pruneMemoryByInformationGain(); // Get pruning statistics const stats = model.getPruningStats(); ``` ## Benefits 1. **Quality Preservation**: Keeps high-information memories while removing redundant ones 2. **Configurable**: Extensive configuration options for different use cases 3. **Efficient**: Optimized algorithms with proper memory management 4. **Validated**: Comprehensive testing and state validation 5. **Integrated**: Seamless integration with existing memory architecture 6. **Monitored**: Detailed statistics and quality metrics ## Future Enhancements Potential improvements for the pruning system: 1. **Adaptive Thresholds**: Dynamic adjustment of scoring weights based on memory usage patterns 2. **Hierarchical Pruning**: Different pruning strategies for different memory tiers 3. **Temporal Considerations**: Age-based weighting in the scoring function 4. **Semantic Clustering**: Group similar memories before pruning decisions 5. **Performance Optimization**: GPU acceleration for large-scale pruning operations ## Conclusion The smart pruning implementation successfully addresses the requirements of Step 6, providing: - ✅ Information-gain based scoring (entropy × surprise - redundancy) - ✅ Configurable retention percentage with quality-based selection - ✅ Memory distillation into long-term storage - ✅ Multiple trigger mechanisms (capacity-based and MCP command) - ✅ Comprehensive unit tests with slot reduction and recall quality verification - ✅ Performance optimization and memory management - ✅ Statistics and monitoring capabilities The system maintains the balance between memory efficiency and information retention, ensuring that the most valuable memories are preserved while redundant information is efficiently removed.

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/henryhawke/mcp-titan'

If you have feedback or need assistance with the MCP directory API, please join our Discord server