Skip to main content
Glama
PLAN.md23 kB
# MCP-Titan HOPE Alignment & Implementation Plan **Status:** Phase 0 - Complete | Phase 1 - Complete **Last Updated:** 2025-11-16 **Target:** Full HOPE (Hierarchical Online Persistent Encoding) architecture aligned with research paper --- ## Executive Summary This document analyzes the MCP-Titan codebase against the HOPE research paper ["Nested Learning: The Illusion of Deep Learning Architectures"](HOPE.md) and provides a comprehensive implementation plan. ### Current State Assessment **✅ IMPLEMENTED (70% of HOPE architecture)** - Continuum Memory System with 3-tier hierarchy (short-term, long-term, archive) - Retentive sequence processing core - Selective state-space filters (Mamba-style) - Memory routing with surprise-based decisions - Optimizer hooks (delta compression, layer scheduling, update buffering) - Multi-level update frequencies (implicit in tier system) - Sparse routing to memory experts - Hierarchical memory promotion/demotion **⚠️ PARTIALLY IMPLEMENTED (needs completion)** - Surprise-based learning (tracked but not weighted in training) - Forgetting gates (config exists but disabled) - Token flow (config exists but not implemented) - Momentum state (serialization exists but not used) **❌ NOT IMPLEMENTED (30% missing)** - Momentum-based memory updates (Equations 32-33 from paper) - Deep neural memory module (currently uses tensor operations, not MLP) - Token flow tracking and sequence dependency weighting - Active forgetting gate mechanism - Self-modifying learning (learning update algorithms) - Nested gradient flows (explicit multi-level optimization) **🐛 CRITICAL ISSUES (blocks production use)** - 42 TypeScript compilation errors (primarily tf.tidy return types) - Type mismatches between HOPE components and IMemoryModel interface - Tensor rank enforcement issues - Gradient handling in variableGrads --- ## HOPE Paper Concepts → MCP-Titan Mapping ### 1. Nested Learning (NL) Paradigm **Research Concept:** Models as nested optimization problems with different update frequencies **Implementation Status:** ✅ **IMPLEMENTED** - `ContinuumMemory`: Three-tier system with different update rates - Short-term: Updated every forward pass - Long-term: Promoted based on access patterns - Archive: Slow consolidation of stable memories - `LayerScheduler`: Controls which layers update per step - `UpdateBuffer`: Batches gradients at different frequencies **Alignment Gap:** None - this is well-implemented **Evidence:** ```typescript // continuum_memory.ts - Multi-tier with different update frequencies shortTermSlots: 64, // Fast updates longTermSlots: 256, // Medium frequency archiveSlots: 512 // Slow consolidation ``` ### 2. Associative Memory **Research Concept:** Memory as operator M: K → V that compresses context flow **Implementation Status:** ✅ **IMPLEMENTED** - `ContinuumMemory.read()`: Query-based retrieval with attention weights - `ContinuumMemory.write()`: Key-value storage with metadata - `MemoryRouter`: Routes queries to appropriate memory tiers **Alignment Gap:** Missing paper's explicit formulation of memory as optimization problem (Equation 1) **Evidence:** ```typescript // continuum_memory.ts:96-120 public read(state: HopeMemoryState, query: tf.Tensor2D, weights: tf.Tensor2D): tf.Tensor2D { // Implements associative read across all memory tiers const reads: tf.Tensor[] = []; if (state.shortTerm.shape[0] > 0) { const shortRead = this.readTier(state.shortTerm, normalizedQuery, weights); reads.push(shortRead); } // ... similar for longTerm, archive } ``` ### 3. Momentum-Based Memory Updates (Core Gap) **Research Concept (Equations 32-33):** ``` S_t = diag(eta_t) * S_{t-1} - diag(theta_t) * (M_{t-1} * k_t^T * k_t - v_t^T * k_t) [Eq 33] M_t = diag(1 - alpha_t) * M_t + S_t [Eq 32] ``` **Implementation Status:** ❌ **NOT IMPLEMENTED** **Alignment Gap:** This is a CRITICAL missing feature. The paper shows momentum is essential for: - Preventing catastrophic forgetting - Stable gradient accumulation - Effective online learning **Required Implementation:** 1. Add `momentumState: tf.Tensor2D` to `HopeMemoryState` 2. Implement `computeMomentumUpdate()` in `ContinuumMemory` 3. Apply momentum in `trainStep()` before memory write 4. Track `momentumDecay` (eta_t parameter) **Priority:** HIGH - Core HOPE mechanism ### 4. Forgetting Gate (Alpha_t) **Research Concept:** Adaptive weight decay for memory stability (lines 472-476) **Implementation Status:** ⚠️ **PARTIAL** - Config exists, mechanism disabled **Evidence:** ```typescript // hope_model/index.ts:38 enableForgettingGate: false // TODO: Implement mechanism ``` **Alignment Gap:** Gate computation not implemented. Need: - Surprise-based alpha_t calculation - Application in memory update: `M_t = diag(1 - alpha_t) * M_t + S_t` **Required Implementation:** 1. `updateForgettingGate(surprise: number): number` - Compute adaptive alpha_t 2. Apply in `ContinuumMemory.write()` before storing 3. Track `forgettingGateHistory` for analysis **Priority:** HIGH - Paired with momentum ### 5. Token Flow Tracking **Research Concept:** Sequential dependency capture beyond momentary surprise (lines 364-366) **Implementation Status:** ⚠️ **PARTIAL** - Serialization exists, logic missing **Evidence:** ```typescript // types.ts - Serialization support exists tokenFlowHistory?: number[]; flowWeights?: number[]; // But no usage in hope_model/ ``` **Alignment Gap:** No active tracking or weighting. Need: - Sliding window of recent tokens - Recency × similarity weighting - Integration into surprise calculation **Required Implementation:** 1. `updateTokenFlow()` in forward pass 2. `computeTokenFlowWeights()` - recency and similarity 3. `weightSurpriseByTokenFlow()` - Adjust surprise scores 4. Use in routing decisions **Priority:** MEDIUM - Enhances sequence modeling ### 6. Deep Neural Memory Module **Research Concept:** MLP-based memory vs matrix operations (lines 450-452) **Implementation Status:** ❌ **NOT IMPLEMENTED** - Uses tensor operations only **Current Approach:** ```typescript // continuum_memory.ts - Direct tensor operations const newShort = tf.concat([state.shortTerm, normalized], 0); ``` **Paper's Approach:** 3-layer MLP with skip connections for memory transformation **Alignment Gap:** Missing expressiveness of deep memory. Need: - `DeepMemoryNetwork` class (3-layer MLP) - Optional path: tensor ops OR deep memory - Config flag: `useDeepMemory: boolean` **Required Implementation:** 1. Create `DeepMemoryNetwork` class 2. Replace concat ops with MLP.forward() when enabled 3. Backprop through memory network in training 4. Checkpoint deep memory weights **Priority:** MEDIUM - Optional enhancement ### 7. Self-Modifying Learning **Research Concept:** Models that learn their own update algorithms (Section 3) **Implementation Status:** ❌ **NOT IMPLEMENTED** **Alignment Gap:** This is an advanced HOPE feature. Current implementation uses fixed AdamOptimizer. **Future Work:** Lower priority - requires significant architecture changes ### 8. Continuum Memory System (CMS) **Research Concept:** Multi-tier memory with different update frequencies (Equation 30-31) **Implementation Status:** ✅ **EXCELLENTLY IMPLEMENTED** **Evidence:** ```typescript // continuum_memory.ts - Exactly matches paper's CMS concept public initialize(): HopeMemoryState { return tf.tidy(() => ({ shortTerm: tf.tensor2d([], [0, memoryDim]), // Fast tier longTerm: tf.tensor2d([], [0, memoryDim]), // Medium tier archive: tf.tensor2d([], [0, memoryDim]), // Slow tier // ... metadata for promotion/demotion })); } private ensureCapacity(state: HopeMemoryState): HopeMemoryState { // Automatic promotion when capacity exceeded if (shortTermSize > this.config.shortTermSlots) { // Promote high-surprise memories to longTerm } } ``` **Alignment:** Perfect match with paper's formulation ### 9. Nested Optimization Problems **Research Concept:** Each component has its own gradient flow and optimization objective **Implementation Status:** ⚠️ **PARTIAL** - Architecture supports it, not fully utilized **Evidence:** ```typescript // optimizer_hooks.ts - Hooks exist for multi-level optimization export class DeltaCompressionHook { ... } export class LayerScheduler { ... } export class UpdateBuffer { ... } ``` **Alignment Gap:** Not exploiting separate objectives for memory vs retention core **Future Enhancement:** Implement separate loss functions per component --- ## TypeScript Error Analysis ### Root Causes (42 errors → 3 categories) #### Category 1: tf.tidy Return Type Mismatch (24 errors) **Issue:** TensorFlow expects `TensorContainer`, HOPE returns custom objects with tensors **Example:** ```typescript // continuum_memory.ts:70 - ERROR public write(...): HopeMemoryState { return tf.tidy(() => { // Returns HopeMemoryState (not TensorContainer) return { shortTerm, longTerm, archive, ... }; }); } ``` **Solution:** Create type-safe wrappers ```typescript // NEW: hope_model/type_utils.ts export function tidyMemoryState<T extends Record<string, tf.Tensor | number>>( fn: () => T ): T { return tf.tidy(() => { const result = fn(); // Keep tensors we're returning Object.values(result).forEach(v => { if (v instanceof tf.Tensor) tf.keep(v); }); return result; }) as T; } ``` **Affected Files:** - `continuum_memory.ts` - 10 errors - `retention_core.ts` - 6 errors - `hope_model/index.ts` - 8 errors #### Category 2: Tensor Rank Enforcement (12 errors) **Issue:** Generic `tf.Tensor` used where `tf.Tensor2D` expected **Example:** ```typescript // index.ts:530 - ERROR const inputTensor = tf.tensor([normalized]); // Generic Tensor model.forward(inputTensor, state); // Expects Tensor2D ``` **Solution:** Explicit rank creation and validation ```typescript const inputTensor = tf.tensor2d([normalized]); // Explicitly Tensor2D // Or with validation: export function ensure2d(t: tf.Tensor): tf.Tensor2D { if (t.rank !== 2) { return t.expandDims(0) as tf.Tensor2D; } return t as tf.Tensor2D; } ``` #### Category 3: Interface Mismatches (6 errors) **Issue:** `HopeMemoryModel` implements `IMemoryModel` but methods don't align **Example:** ```typescript // IMemoryModel expects: forward(x: tf.Tensor2D, state: IMemoryState): ForwardResult // HopeMemoryModel has: forward(x: tf.Tensor2D, state: IMemoryState): { predicted: tf.Tensor2D; memoryUpdate: IMemoryUpdateResult; // Different return type } ``` **Solution:** Update `IMemoryModel` interface to match HOPE's richer return types --- ## Revised Implementation Plan ### Phase 0: Fix TypeScript Errors (CURRENT PRIORITY) **Rationale:** Can't implement new features with broken compilation **Duration:** 2-3 days **Tasks:** 1. ✅ Create `hope_model/type_utils.ts` with type-safe wrappers 2. ✅ Fix all `continuum_memory.ts` tf.tidy errors (10) 3. ✅ Fix all `retention_core.ts` errors (6) 4. ✅ Fix `hope_model/index.ts` errors (8) 5. ✅ Fix `src/index.ts` tensor rank errors (6) 6. ✅ Update `IMemoryModel` interface for HOPE compatibility 7. ✅ Fix `trainer.ts` method call errors (4) 8. ✅ Verify 0 compilation errors 9. ✅ All existing tests pass 10. ✅ Document fixes in `docs/typescript-fixes.md` **Success Criteria:** - `npm run build` → 0 errors - `npm test` → 100% passing - No memory leaks (verified with `tf.memory()`) **See:** `docs/typescript-error-resolution-guide.md` for detailed fix procedures ### Phase 1: Implement Core HOPE Features (Paper Alignment) **Duration:** 1 week #### Task 1.1: Momentum-Based Memory Updates (Equations 32-33) **Research Reference:** HOPE paper lines 426-489, Appendix C **Implementation:** ```typescript // continuum_memory.ts export interface HopeMemoryState { // ... existing fields momentumState?: tf.Tensor2D; // S_t in paper momentumDecay?: number; // eta_t parameter } public computeMomentumUpdate( prevMomentum: tf.Tensor2D, currentMemory: tf.Tensor2D, keys: tf.Tensor2D, values: tf.Tensor2D, learningRate: number ): tf.Tensor2D { return tf.tidy(() => { // Equation 33: S_t = diag(eta) * S_{t-1} - diag(theta) * (M * k^T * k - v^T * k) const decayed = prevMomentum.mul(this.config.momentumDecay); const memoryTerm = currentMemory.matMul(keys.transpose()).matMul(keys); const valueTerm = values.transpose().matMul(keys); const gradient = memoryTerm.sub(valueTerm); const update = decayed.sub(gradient.mul(learningRate)); return update as tf.Tensor2D; }); } public applyMomentumToMemory( memory: tf.Tensor2D, momentum: tf.Tensor2D, forgettingGate: number ): tf.Tensor2D { return tf.tidy(() => { // Equation 32: M_t = diag(1 - alpha) * M_t + S_t const retained = memory.mul(1 - forgettingGate); const updated = retained.add(momentum); return updated as tf.Tensor2D; }); } ``` **Integration Points:** 1. Update `trainStep()` to compute momentum before memory write 2. Store momentum state in checkpoint serialization 3. Expose momentum stats via `get_memory_state` tool **Tests:** - Momentum accumulates over multiple training steps - Forgetting gate prevents unbounded growth - Memory updates are stable (no NaN/Inf) **Priority:** CRITICAL #### Task 1.2: Forgetting Gate Mechanism **Research Reference:** Lines 472-476 **Implementation:** ```typescript // continuum_memory.ts public updateForgettingGate(surprise: number): number { // Adaptive forgetting based on surprise const baseAlpha = 0.1; // Base forgetting rate const surpriseWeight = 0.3; // How much surprise affects forgetting const alpha_t = Math.min(0.5, baseAlpha * (1 + surpriseWeight * surprise)); return alpha_t; } public write(state: HopeMemoryState, embedding: tf.Tensor2D, metadata: MemoryWriteMetadata): HopeMemoryState { return tidyMemoryState(() => { const alpha_t = this.updateForgettingGate(metadata.surprise); // Apply forgetting to existing memory const retainedShort = state.shortTerm.mul(1 - alpha_t); // Compute momentum if enabled let updated = retainedShort; if (state.momentumState) { const momentum = this.computeMomentumUpdate( state.momentumState, retainedShort, embedding, embedding, this.config.learningRate ); updated = this.applyMomentumToMemory(retainedShort, momentum, alpha_t); } // Add new memory const newShort = tf.concat([updated, this.normalize(embedding)], 0); return { ...state, shortTerm: newShort, momentumState: momentum, forgettingGate: alpha_t }; }); } ``` **Priority:** CRITICAL (paired with momentum) #### Task 1.3: Token Flow Tracking **Research Reference:** Section 3.1, lines 364-366 **Implementation:** ```typescript // hope_model/index.ts export interface TokenFlowState { history: number[][]; // Recent token embeddings weights: number[]; // Recency × similarity weights windowSize: number; // Sliding window (default 32) decay: number; // Temporal decay (default 0.95) } private updateTokenFlow( currentEmbedding: tf.Tensor2D, flowState: TokenFlowState ): TokenFlowState { const embedding = currentEmbedding.arraySync()[0]; // Add to history with sliding window const history = [...flowState.history, embedding].slice(-flowState.windowSize); // Compute recency weights const weights = history.map((_, i) => { const recency = Math.pow(flowState.decay, history.length - i - 1); const similarity = this.cosineSimilarity(embedding, history[i]); return recency * similarity; }); return { ...flowState, history, weights }; } private weightSurpriseByTokenFlow(surprise: number, flowWeights: number[]): number { if (flowWeights.length === 0) return surprise; const flowStrength = flowWeights.reduce((a, b) => a + b, 0) / flowWeights.length; return surprise * (1 + 0.3 * flowStrength); // 0.3 = flow weight factor } ``` **Integration:** - Call `updateTokenFlow()` in `computeForward()` before memory read - Use weighted surprise in routing decisions - Expose via new `get_token_flow_metrics` MCP tool **Priority:** MEDIUM #### Task 1.4: Deep Memory Module (Optional) **Research Reference:** Lines 450-452 **Implementation:** ```typescript // hope_model/deep_memory.ts export class DeepMemoryNetwork { private layer1: tf.Variable<tf.Rank.R2>; private layer2: tf.Variable<tf.Rank.R2>; private layer3: tf.Variable<tf.Rank.R2>; constructor(memoryDim: number) { this.layer1 = tf.variable(tf.randomNormal([memoryDim, 2 * memoryDim])); this.layer2 = tf.variable(tf.randomNormal([2 * memoryDim, 2 * memoryDim])); this.layer3 = tf.variable(tf.randomNormal([2 * memoryDim, memoryDim])); } public forward(input: tf.Tensor2D): tf.Tensor2D { return tf.tidy(() => { const h1 = tf.silu(input.matMul(this.layer1)); const h2 = tf.silu(h1.matMul(this.layer2)); const h3 = h2.matMul(this.layer3); // Skip connection return input.add(h3) as tf.Tensor2D; }); } public getVariables(): tf.Variable[] { return [this.layer1, this.layer2, this.layer3]; } } ``` **Integration:** - Optional flag in config: `useDeepMemory: boolean` - Use in `ContinuumMemory.write()` to transform embeddings - Include in gradient computation during training **Priority:** LOW (optional enhancement) ### Phase 2: Testing & Validation **Duration:** 3 days **Tasks:** 1. Unit tests for momentum computation (verify Equations 32-33) 2. Integration tests for forgetting gate (verify stability) 3. Token flow tracking tests (verify sequence learning) 4. End-to-end HOPE pipeline test 5. Performance benchmarks (memory usage, latency) 6. Memory leak verification (long-running tests) **Success Criteria:** - All tests passing - No memory leaks over 1000 iterations - Performance within 10% of pre-HOPE baseline ### Phase 3: Documentation & Release **Duration:** 2 days **Tasks:** 1. Update README.md (see below for layman's version) 2. Update API documentation with new tools 3. Create HOPE alignment document 4. Update CHANGELOG 5. Create migration guide for TITAN → HOPE 6. Record demo video showing HOPE features --- ## Layman's Terms: What is HOPE? ### Simple Explanation Imagine your brain remembering a conversation: - **Working memory** holds what was just said (seconds ago) - **Short-term memory** holds the main points (minutes ago) - **Long-term memory** holds important patterns (days/years ago) HOPE does this for AI: ``` User: "What's the capital of France?" AI (working memory): Processes query → "Paris" User: "And its population?" AI (short-term): Remembers we're talking about Paris → "~2.2 million" User: "Tell me about France" AI (long-term): Recalls promoted facts → "Paris is capital, French language, EU member..." ``` ### Key Features in Plain English **1. Multi-Level Memory (Continuum Memory System)** - Fast memory: Recent stuff, updated constantly - Medium memory: Frequently accessed, updated periodically - Slow memory: Stable knowledge, rarely changes **2. Smart Forgetting (Forgetting Gate)** - Not all memories are worth keeping - Surprising information → forget less - Boring/redundant → forget more - Prevents memory overflow **3. Momentum Learning** - Like how you learn: gradual reinforcement - First time seeing something: small memory - Repeated exposure: stronger memory - Prevents "catastrophic forgetting" (learning X doesn't erase Y) **4. Sequence Awareness (Token Flow)** - Understands "A then B then C" patterns - Not just individual facts - "Hello" followed by "World" → expects this sequence next time **5. Surprise-Based Learning** - Novel information gets more attention - Familiar patterns processed quickly - Allocates learning capacity intelligently ### How It's Different from Standard Transformers | Standard Transformer | HOPE Memory | |----------------------|-------------| | Fixed context window | Unlimited memory via tiers | | All tokens equal importance | Surprise-weighted attention | | No persistent learning | Online learning with momentum | | Forgets after session | Remembers across sessions | | Single-level processing | Multi-level nested learning | ### Real-World Use Cases **Code Assistant:** ``` Session 1: User writes React components HOPE: Stores React patterns in long-term memory Session 2: User asks "How do I useState?" HOPE: Retrieves from long-term → knows user's React context ``` **Research Assistant:** ``` Reading 100 papers over time: - Working: Current paper's main argument - Short-term: Cross-references to recent papers - Long-term: Stable concepts across all papers Query: "What's consensus on topic X?" HOPE: Synthesizes from all 3 memory levels ``` **Continuous Learning:** ``` Traditional AI: Train once → deploy → frozen HOPE: Train → deploy → continue learning from usage - Adapts to user's domain - Remembers corrections - Improves over time ``` --- ## Success Metrics ### Technical Metrics - ✅ TypeScript compilation: 0 errors - ✅ Test coverage: >80% - ✅ Memory leaks: 0 over 10,000 iterations - ✅ Latency: <100ms per forward pass (95th percentile) - ✅ HOPE features: Momentum ✓, Forgetting ✓, Token Flow ✓ ### Research Alignment Metrics - ✅ Implements Equations 32-33 (momentum) - ✅ Implements Equation 30-31 (CMS) - ✅ Nested optimization (multi-level gradients) - ✅ Surprise-based adaptation - ⚠️ Self-modifying learning (future work) ### User Experience Metrics - Documentation comprehensible to non-ML practitioners - Example workflows run without errors - Clear migration path from TITAN - Active MCP tool usage in real sessions --- ## Timeline | Phase | Duration | Status | |-------|----------|--------| | Phase 0: Fix TypeScript | 2-3 days | ✅ Complete | | Phase 1: Core HOPE Features | 1 week | ✅ Complete | | Phase 2: Testing | 3 days | ⏳ Pending | | Phase 3: Documentation | 2 days | ⏳ Pending | | **Total** | **~12 days** | **75% Complete** | --- ## References - **Research Paper:** `HOPE.md` - "Nested Learning: The Illusion of Deep Learning Architectures" - **TypeScript Fixes:** `docs/typescript-error-resolution-guide.md` - **Original Plan:** `PLAN.md` (archived to `docs/archive/PLAN_v1.md`) - **Architecture:** `docs/architecture-overview.md` - **API Reference:** `docs/api/README.md` --- ## Next Actions **Immediate (this week):** 1. ✅ Complete TypeScript error fixes (Phase 0) 2. ✅ Implement momentum updates (Task 1.1) 3. ✅ Implement forgetting gate (Task 1.2) **Near-term (next week):** 4. ✅ Token flow tracking (Task 1.3) 5. ⏳ Testing & validation (Phase 2) 6. ⏳ Documentation updates (Phase 3) **Future considerations:** - Deep memory module (optional) - Self-modifying learning (research) - Additional optimizer hooks - Multi-modal memory support

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/henryhawke/mcp-titan'

If you have feedback or need assistance with the MCP directory API, please join our Discord server