Recall

Overview Schema Related Servers Score Discussions

recall
docs

LONG_TERM_MEMORY_RESEARCH.md•20.4 KiB

# Long-Term Memory for AI Agents: Deep Research Report ## Executive Summary This report analyzes long-term memory (LTM) implementation for AI agents, specifically considering integration with your current stack: **docvec + Ollama + nomic-embed-text + ChromaDB**. The research covers architectures, storage approaches, pitfalls, challenges, and practical recommendations. --- ## Table of Contents 1. [Memory Architecture Fundamentals](#1-memory-architecture-fundamentals) 2. [Storage Technologies Comparison](#2-storage-technologies-comparison) 3. [Leading Memory Systems Analysis](#3-leading-memory-systems-analysis) 4. [Critical Pitfalls & Challenges](#4-critical-pitfalls--challenges) 5. [Your Stack: Strengths & Gaps](#5-your-stack-strengths--gaps) 6. [Implementation Recommendations](#6-implementation-recommendations) 7. [References](#7-references) --- ## 1. Memory Architecture Fundamentals ### 1.1 Memory Types (Human-Inspired) Modern AI memory systems draw from cognitive science, implementing three primary memory types: | Memory Type | Description | AI Implementation | Use Case | |-------------|-------------|-------------------|----------| | **Episodic** | Specific events/interactions | Conversation logs, timestamped entries | "What did we discuss last Tuesday?" | | **Semantic** | General knowledge/facts | Knowledge graphs, factual storage | "User prefers Python over JavaScript" | | **Procedural** | How to do things | Learned workflows, tool usage patterns | "How to deploy to production" | ### 1.2 Memory Lifecycle ``` ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ EXTRACTION │ ──▶ │ STORAGE │ ──▶ │ RETRIEVAL │ ──▶ │ USAGE │ │ │ │ │ │ │ │ │ │ - Entity │ │ - Vector DB │ │ - Semantic │ │ - Context │ │ extraction│ │ - Graph DB │ │ search │ │ injection │ │ - Relation │ │ - Key-value │ │ - Re-ranking│ │ - Reasoning │ │ mapping │ │ │ │ - Filtering │ │ │ └─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘ │ │ └───────────────── UPDATE/CONSOLIDATION ◀───────────────────┘ ``` ### 1.3 Short-Term vs Long-Term Memory | Aspect | Short-Term (Working) | Long-Term (Persistent) | |--------|---------------------|------------------------| | Duration | Single session | Cross-session | | Storage | Context window | External DB | | Capacity | Token-limited (~128K) | Unlimited | | Access | Direct | Retrieval-based | | Examples | Current conversation | User preferences, past decisions | --- ## 2. Storage Technologies Comparison ### 2.1 Vector Databases (Your Current: ChromaDB) **How it works:** Converts text to high-dimensional embeddings, enables semantic similarity search. | Strengths | Weaknesses | |-----------|------------| | Excellent semantic search | No native relationship modeling | | Fast similarity queries | Can't answer "why" questions | | Scales well | Limited temporal reasoning | | Works with your stack | False positives on ambiguous queries | **ChromaDB Specific:** - ✅ Lightweight, easy to deploy - ✅ Good for local-first development - ⚠️ Library mode causes stale data issues (use server mode in production) - ⚠️ Limited to ~10M vectors before performance degrades - ⚠️ No built-in TTL/expiration ### 2.2 Graph Databases (e.g., Neo4j, Memgraph) **How it works:** Stores entities as nodes, relationships as edges. Enables traversal queries. | Strengths | Weaknesses | |-----------|------------| | Excellent relationship modeling | No native semantic search | | Temporal reasoning ("what led to X?") | Requires schema design | | Multi-hop queries | More complex to maintain | | Explainable connections | Higher operational overhead | **Best for:** - "What decisions led to this outcome?" - "Who was involved in project X?" - Audit trails and compliance ### 2.3 Hybrid Approaches (RECOMMENDED) **Architecture:** ``` ┌─────────────────────────────────────────────────────────────┐ │ HYBRID MEMORY LAYER │ ├─────────────────────────────────────────────────────────────┤ │ ┌─────────────────┐ ┌─────────────────┐ │ │ │ Vector Store │◀──▶│ Graph Store │ │ │ │ (ChromaDB) │ │ (Neo4j/SQLite)│ │ │ │ │ │ │ │ │ │ - Embeddings │ │ - Entities │ │ │ │ - Semantic │ │ - Relations │ │ │ │ search │ │ - Timestamps │ │ │ └─────────────────┘ └─────────────────┘ │ │ │ │ │ │ └──────────┬───────────┘ │ │ ▼ │ │ ┌─────────────────┐ │ │ │ Unified Query │ │ │ │ Layer │ │ │ └─────────────────┘ │ └─────────────────────────────────────────────────────────────┘ ``` **Leading implementations:** - **Mem0**: Vector + Graph (Neo4j) + Key-value - **Zep**: Temporal knowledge graph + embeddings - **Graphiti**: Graph-first with vector augmentation --- ## 3. Leading Memory Systems Analysis ### 3.1 Benchmark Comparison (2024-2025) | System | Accuracy | Latency | Token Efficiency | Best For | |--------|----------|---------|------------------|----------| | **Mem0** | 66.9% | 1.4s | ~2K tokens/query | Production chat, balanced | | **Mem0-Graph** | Higher on relational | 2.1s | Higher | Timeline queries, relationships | | **OpenAI Memory** | 52.9% | 0.9s | Low | Rapid prototyping | | **LangMem** | Moderate | 60s | High | Open-source, customizable | | **Zep** | 94.8% (DMR) | Low | Excellent | Enterprise, temporal | ### 3.2 Mem0 Architecture (Most Relevant to Your Stack) ```python # Mem0's approach (simplified) class Mem0Memory: def __init__(self): self.vector_store = ChromaDB() # Semantic search self.graph_store = Neo4j() # Relationships self.kv_store = Redis() # Fast lookups def add_memory(self, content, user_id): # 1. Extract entities and relations via LLM entities, relations = self.llm.extract(content) # 2. Generate embedding embedding = self.embed(content) # 3. Store in parallel self.vector_store.add(embedding, metadata={...}) self.graph_store.add_nodes(entities) self.graph_store.add_edges(relations) def search(self, query, user_id): # 1. Vector similarity search candidates = self.vector_store.search(query, k=10) # 2. Graph context enrichment enriched = self.graph_store.get_related(candidates) # 3. Re-rank and return return self.rerank(enriched, query) ``` ### 3.3 Zep's Temporal Knowledge Graph Zep introduces **bi-temporal memory**: - **Valid time**: When the fact was true in reality - **Transaction time**: When it was recorded This enables queries like: - "What did we know about X at time T?" - "How has the user's preference changed over time?" --- ## 4. Critical Pitfalls & Challenges ### 4.1 Memory Hallucinations (CRITICAL) **The HaluMem Benchmark** identified three stages where hallucinations occur: | Stage | Problem | Example | |-------|---------|---------| | **Extraction** | LLM fabricates entities/facts | "User mentioned they love skiing" (they didn't) | | **Updating** | Conflicting memories merged incorrectly | Old preference overwrites new one | | **Retrieval** | Wrong memories retrieved | Semantically similar but contextually wrong | **Mitigation:** - Use structured extraction prompts with validation - Implement conflict detection before updates - Add source attribution to all memories - Regular memory audits ### 4.2 Embedding Model Limitations (FUNDAMENTAL) **Google DeepMind's Finding (2024):** > "There is a mathematical ceiling on the complexity of query-document relationships that single-vector embeddings can represent." **Key issues with nomic-embed-text and similar models:** | Problem | Description | Impact | |---------|-------------|--------| | **Semantic Gap** | Can't capture all query-document relationships | False negatives in retrieval | | **Critical-n Point** | Performance degrades beyond certain document count | ~20% recall at scale | | **Noun Bias** | Sentence transformers favor nouns over predicates | Miss action-based queries | | **Domain Mismatch** | General models underperform on specialized content | Need fine-tuning | **nomic-embed-text Specific:** - ✅ 8192 token context (excellent) - ✅ Good general performance - ⚠️ May struggle with code-heavy content - ⚠️ No built-in temporal understanding ### 4.3 Stale Memory Problem ``` Time T1: Store "User prefers dark mode" Time T2: User says "I switched to light mode" Time T3: Query about preferences → Returns T1 memory (WRONG) ``` **Solutions:** - Implement memory versioning - Add timestamps and recency weighting - Use temporal decay functions - Explicit memory invalidation ### 4.4 Context Window Overflow Even with retrieval, injecting too many memories causes: - Token budget exhaustion - "Lost in the middle" effect (middle context ignored) - Increased latency and cost **Solutions:** - Token-budget aware retrieval (your docvec has this!) - Memory summarization/consolidation - Hierarchical memory (summaries → details) ### 4.5 Cold Start Problem New users/projects have no memories, leading to: - Poor initial experience - No personalization - Generic responses **Solutions:** - Explicit onboarding prompts - Import from external sources - Default memory templates ### 4.6 Privacy & Security | Risk | Description | Mitigation | |------|-------------|------------| | **Data Leakage** | Memories contain sensitive info | Encryption at rest, access controls | | **Cross-User Contamination** | Wrong user's memories retrieved | Strict namespace isolation | | **Memory Poisoning** | Malicious data injection | Input validation, source verification | ### 4.7 Operational Challenges | Challenge | Description | |-----------|-------------| | **Memory Bloat** | Unlimited growth degrades performance | | **Debugging** | Hard to trace why specific memory was retrieved | | **Versioning** | Embedding model changes invalidate all memories | | **Backup/Restore** | Multiple stores = complex backup | --- ## 5. Your Stack: Strengths & Gaps ### 5.1 Current Stack Analysis ``` ┌─────────────────────────────────────────────────────────────┐ │ YOUR CURRENT STACK │ ├─────────────────────────────────────────────────────────────┤ │ Ollama (Local) │ │ └── nomic-embed-text (Embeddings) │ │ │ │ ChromaDB (Vector Store) │ │ └── Persistent collections │ │ │ │ docvec (MCP Server) │ │ ├── index_file, index_directory │ │ ├── search, search_with_filters │ │ ├── search_with_budget ← EXCELLENT for memory │ │ └── Management tools │ └─────────────────────────────────────────────────────────────┘ ``` ### 5.2 Strengths | Strength | Why It Matters | |----------|----------------| | **Local-first** | Privacy, no API costs, low latency | | **Token-budget search** | Critical for memory → context injection | | **Metadata filtering** | Enables namespace isolation | | **Smart chunking** | Better retrieval granularity | | **Deduplication** | Prevents memory bloat | ### 5.3 Gaps for Full Memory System | Gap | Impact | Solution | |-----|--------|----------| | **No graph layer** | Can't model relationships | Add lightweight graph (SQLite, Neo4j) | | **No temporal awareness** | Can't reason about time | Add timestamps, decay functions | | **No memory CRUD** | File-based, not memory-native | Add memory abstraction layer | | **No consolidation** | Memories grow unbounded | Add summarization pipeline | | **No conflict resolution** | Contradictory memories persist | Add update/supersede logic | ### 5.4 nomic-embed-text Assessment **Model specs:** - Dimensions: 768 - Max tokens: 8192 (excellent for documents) - Type: General-purpose text embedding **For memory use:** | Use Case | Suitability | Notes | |----------|-------------|-------| | Semantic search | ✅ Excellent | Core strength | | Code memories | ⚠️ Moderate | Consider nomic-embed-code | | Short memories | ⚠️ Moderate | May need normalization | | Temporal queries | ❌ Poor | Needs metadata augmentation | | Relationship queries | ❌ Poor | Needs graph layer | --- ## 6. Implementation Recommendations ### 6.1 Phased Approach #### Phase 1: Simple Memory Layer (Week 1) Leverage existing docvec with conventions: ``` ~/.memories/ ├── global/ │ ├── user_preferences.md │ └── learned_patterns.md ├── project_foo/ │ ├── context.md │ ├── decisions.md │ └── progress.md └── .memory_index.json # Metadata ``` **Add to docvec:** ```python # New MCP tools def memory_store(namespace: str, key: str, content: str, tags: list): """Store a memory with metadata""" file_path = f"~/.memories/{namespace}/{key}.md" # Add timestamp, tags to frontmatter # Index with docvec def memory_recall(query: str, namespace: str = None, max_tokens: int = 2000): """Recall relevant memories within token budget""" filters = {"namespace": namespace} if namespace else {} return search_with_budget(query, max_tokens, filters) def memory_forget(namespace: str, key: str): """Remove a memory""" delete_file(f"~/.memories/{namespace}/{key}.md") ``` #### Phase 2: Add Temporal Awareness (Week 2) ```python # Memory schema with timestamps memory = { "id": "mem_123", "content": "User prefers dark mode", "created_at": "2024-01-15T10:30:00Z", "updated_at": "2024-01-15T10:30:00Z", "valid_from": "2024-01-15T10:30:00Z", "valid_until": null, # Currently valid "confidence": 0.95, "source": "explicit_statement", "supersedes": null, "tags": ["preference", "ui"] } ``` **Retrieval with recency weighting:** ```python def score_memory(memory, query_embedding, current_time): semantic_score = cosine_similarity(memory.embedding, query_embedding) age_days = (current_time - memory.created_at).days recency_score = math.exp(-age_days / 30) # 30-day half-life return 0.7 * semantic_score + 0.3 * recency_score ``` #### Phase 3: Add Relationship Graph (Week 3-4) **Lightweight SQLite graph:** ```sql -- Entities CREATE TABLE entities ( id TEXT PRIMARY KEY, type TEXT, -- person, project, concept, decision name TEXT, memory_id TEXT, -- Link to vector store created_at TIMESTAMP ); -- Relationships CREATE TABLE relationships ( id TEXT PRIMARY KEY, source_id TEXT, target_id TEXT, relation_type TEXT, -- RELATES_TO, CAUSED_BY, PART_OF, SUPERSEDES created_at TIMESTAMP, FOREIGN KEY (source_id) REFERENCES entities(id), FOREIGN KEY (target_id) REFERENCES entities(id) ); ``` **Query pattern:** ```python def memory_search_with_graph(query: str): # 1. Vector search for candidates candidates = docvec.search(query, n_results=20) # 2. Get related entities from graph entity_ids = [c.metadata.entity_id for c in candidates] related = graph.get_related(entity_ids, depth=2) # 3. Fetch related memories related_memories = docvec.get_by_ids(related.memory_ids) # 4. Re-rank combined results return rerank(candidates + related_memories, query) ``` ### 6.2 Memory Operations Protocol **For Claude Code / AI agents:** ```markdown ## Memory Protocol BEFORE starting any task: 1. Check memories: `memory_recall(task_description, namespace="current_project")` 2. Load user preferences: `memory_recall("preferences", namespace="global")` DURING task execution: - Store significant decisions: `memory_store(namespace, "decision_X", content)` - Update progress: `memory_store(namespace, "progress", content)` AFTER task completion: - Summarize learnings: `memory_store(namespace, "learnings", summary)` - Update any changed preferences MEMORY HYGIENE: - Consolidate related memories weekly - Archive old project memories - Never store secrets or credentials ``` ### 6.3 Handling Common Pitfalls | Pitfall | Implementation | |---------|----------------| | **Stale memories** | Add `valid_until` field, check on retrieval | | **Contradictions** | Use `supersedes` relationship, show newest | | **Bloat** | Periodic consolidation job, max memories per namespace | | **Hallucinations** | Require source attribution, confidence scores | | **Privacy** | Namespace isolation, no cross-user queries | ### 6.4 Production Checklist - [ ] Run ChromaDB in server mode (not library mode) - [ ] Implement namespace isolation - [ ] Add memory backup/export - [ ] Set up monitoring for retrieval quality - [ ] Plan for embedding model updates (re-indexing strategy) - [ ] Implement rate limiting on memory writes - [ ] Add memory size limits per namespace - [ ] Create memory audit logs --- ## 7. References ### Papers 1. "HaluMem: Evaluating Hallucinations in Memory Systems of Agents" (2025) 2. "Mem0: Building Production-Ready AI Agents with Scalable Long-Term Memory" (2025) 3. "Human-inspired Perspectives: A Survey on AI Long-term Memory" (2024) 4. "Memory Architectures in Long-Term AI Agents" (2025) 5. "Zep: A Temporal Knowledge Graph Architecture for Agent Memory" (2025) 6. Google DeepMind's RAG embedding limitations study (2024) ### Systems - Mem0: https://mem0.ai - Zep: https://getzep.com - LangMem: https://langchain.com - Graphiti: https://github.com/getzep/graphiti - MemoriesDB: https://arxiv.org/abs/2511.06179 ### Your Stack Documentation - ChromaDB: https://docs.trychroma.com - nomic-embed-text: https://huggingface.co/nomic-ai/nomic-embed-text-v1 - Ollama: https://ollama.ai/library/nomic-embed-text --- ## Appendix A: Quick Decision Matrix **Should you add a graph layer?** | If you need... | Vector Only | Add Graph | |----------------|-------------|-----------| | "Find similar memories" | ✅ | | | "What caused this decision?" | | ✅ | | "User preferences" | ✅ | | | "Timeline of project X" | | ✅ | | "Related to topic Y" | ✅ | | | "Who was involved?" | | ✅ | **Recommended path:** Start with vector-only (your current stack), add graph when you hit relationship query needs. --- *Report generated: December 2024* *Stack: docvec + Ollama + nomic-embed-text + ChromaDB*

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/blueman82/recall'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

LONG_TERM_MEMORY_RESEARCH.md•20.4 KiB