Recall

Overview Schema Related Servers Score Discussions

recall
docs
research_reports

MEMORY_SYSTEMS_RESEARCH.md•28.6 KiB

# Deep Research: Long-Term Memory Systems for AI Agents and LLMs (2024-2025) ## Executive Summary This comprehensive research document synthesizes findings from recent academic papers (2024-2025), production-ready frameworks, and architectural innovations in AI agent memory systems. The research covers four main dimensions: memory architectures, storage patterns, retrieval strategies, and best practices for building persistent, scalable memory systems for LLM-powered agents. **Key Findings:** - Modern memory systems evolved from the MemGPT/Letta OS-inspired paradigm, introducing hierarchical memory management with multiple tiers - Vector databases (Chroma, Weaviate, Qdrant) dominate storage, with emerging hybrid vector+graph approaches showing superior performance - RAG (Retrieval Augmented Generation) patterns have matured into sophisticated multi-stage systems with re-ranking and semantic post-processing - Production systems (Mem0, Letta, LangChain) demonstrate 26-91% improvements in accuracy, latency, and token efficiency over baseline approaches --- ## 1. MEMORY ARCHITECTURES ### 1.1 Overview of Main Approaches AI agents employ three complementary memory architecture paradigms: #### A. **Hierarchical Memory Systems (MemGPT/Letta Paradigm)** **Source:** Packer et al., "MemGPT: Towards LLMs as Operating Systems" (arXiv:2310.08560) The foundational approach treats LLM context management like traditional OS virtual memory: - **In-Context Memory (Fast):** Core facts, ongoing conversations stored in the LLM's limited context window - **Out-of-Context Memory (Slow):** Long-term persistent storage requiring explicit retrieval - **Virtual Context Management:** Intelligent swapping between memory tiers based on relevance **Letta Implementation (formerly MemGPT, 2024-2025):** ``` Memory Hierarchy: ├── Memory Blocks (in-context, editable by agent) │ ├── Persona: Agent's identity and system instructions │ ├── Human: User information and preferences │ └── Custom blocks: Domain-specific persistent data ├── Message History: Full conversation timeline (append-only) └── Vector Store: Embedded memories for semantic search ``` **Key Capability:** Agents can use tools to explicitly edit/search memory blocks, enabling self-reflection and persistent learning. #### B. **Multi-Level Memory Architecture (Mem0, 2025)** **Source:** Chhikara et al., "Mem0: Building Production-Ready AI Agents with Scalable Long-Term Memory" (arXiv:2504.19413) Extends hierarchical memory with adaptive consolidation and graph-based representations: - **User-Level Memory:** Cross-session user preferences and long-term profile - **Session-Level Memory:** Current conversation context and immediate history - **Agent-Level Memory:** Autonomous learning and internal state management - **Dynamic Consolidation:** Extracting, deduplicating, and summarizing memories from ongoing interactions - **Graph Memory Variant:** Capturing relational structures between conversational elements **Performance Metrics (LOCOMO Benchmark):** - 26% accuracy improvement over OpenAI's memory implementation - 91% lower p95 latency vs. full-context approach - 90%+ token savings without degradation #### C. **Semantic Memory Systems** Three distinct memory types for AI systems (adapted from neuroscience): | Memory Type | Implementation | Use Case | |---|---|---| | **Episodic Memory** | Timestamped events, experiences with context | "What happened in our last conversation?" | | **Semantic Memory** | Facts, concepts, extracted knowledge | "What are this user's preferences?" | | **Procedural Memory** | Learned strategies, tool usage patterns | "How did I solve this before?" | **Recent Work (2024):** - **Episodic Memory Verbalization** (arXiv:2409.17702): Hierarchical tree-like data structures representing raw perception data → abstract events → natural language concepts - **Dual-Memory Systems** (arXiv:2407.16034): Combining episodic (specific events) + semantic (general patterns) memory with memory growth bounds for scalability - **Cognitive Mapping** (arXiv:2411.08447): Dynamically expanding representational models that adapt to novel environmental contexts #### D. **Graph-Based Memory Representations** Emerging approach combining vector embeddings with relational structure: ``` Graph Memory Structure: - Nodes: Entities (people, concepts, facts) - Edges: Relationships (mentions, temporal sequences, dependencies) - Attributes: Relevance scores, timestamps, confidence levels ``` **Advantages:** - Captures multi-hop reasoning (A→B→C inference chains) - Supports temporal queries and recency weighting - Enables contradiction detection and resolution - **Performance:** +2% additional improvement when combined with vector memory (Mem0 paper) --- ### 1.2 Framework Implementations #### **LangChain Memory** Provides multiple memory types for conversation management: ```python memory_types = { "ConversationBufferMemory": "Stores full conversation history", "ConversationSummaryMemory": "Periodically summarizes conversation", "ConversationTokenBufferMemory": "Maintains fixed token limit", "ConversationBufferWindowMemory": "Rolling window of recent turns", "VectorStoreRetrieverMemory": "Uses semantic search for relevant memories" } ``` **Integration Pattern:** LangChain + LangGraph enables stateful agent workflows with persistent memory, observable execution paths, and human-in-the-loop checkpoints. #### **LlamaIndex Memory & Retrieval** Data framework specifically designed for augmenting LLMs: - **Index Types:** VectorStoreIndex (semantic), SummaryIndex (summarized), KeywordTableIndex (keyword-based) - **Retrieval:** Query engines with built-in re-ranking, metadata filtering, and source attribution - **Memory Modes:** - Chat History (recent messages) - Summary (periodically generated overview) - Hybrid (combining approaches) **300+ Integration Packages** on LlamaHub for various vector stores, LLMs, and data sources. #### **Semantic Kernel (Microsoft)** Enterprise-focused orchestration with memory as first-class component: - **Memory Stores:** Azure AI Search, Elasticsearch, Chroma, and custom implementations - **Plugin Ecosystem:** Native functions, prompt templates, OpenAPI specs - **Multi-Agent Architecture:** Agents with independent memory yet shared knowledge - **Production Features:** RBAC, observability, stable APIs #### **Letta (formerly MemGPT)** Pioneering stateful agent platform: ```python # Multi-agent shared memory example shared_block = client.blocks.create( label="organization", value="Shared context across all agents" ) agent1 = client.agents.create( memoryBlocks=[{"label": "persona", "value": "Agent 1"}], blockIds=[shared_block.id] # Attached shared memory ) agent2 = client.agents.create( memoryBlocks=[{"label": "persona", "value": "Agent 2"}], blockIds=[shared_block.id] # Same shared block ) ``` **Unique Features:** - Sleep-time agents (background processing with subconsciousness) - Explicit memory editing via tools - Perpetual agents with infinite message history - Multi-agent coordination via shared memory blocks --- ## 2. STORAGE PATTERNS ### 2.1 Vector Database Approaches #### **A. Chroma (Embedding Database)** **Repository:** chroma-core/chroma **Status:** Production-ready, open-source (Apache 2.0) ```python import chromadb # Simple 4-function API client = chromadb.Client() collection = client.create_collection("memories") # Add memories collection.add( documents=["User loves coffee", "User works in tech"], metadatas=[{"source": "profile"}, {"source": "conversation"}], ids=["mem_1", "mem_2"] ) # Query with semantic search results = collection.query( query_texts=["What are the user's interests?"], n_results=2, where={"source": "profile"} # Optional filtering ) ``` **Characteristics:** - Fully-typed, fully-tested, fully-documented - Automatic embeddings (Sentence Transformers default, customizable) - Dev/test/prod with same API - Feature-rich: queries, filtering, regex, metadata operations - Integrations: LangChain, LlamaIndex, and others #### **B. Weaviate (Semantic Search Engine)** **Repository:** weaviate/weaviate **Status:** Production (built in Go for speed) **Advanced Capabilities:** - **Flexible Vectorization:** Integrated vectorizers (OpenAI, Cohere, HuggingFace, Google) or pre-computed embeddings - **Hybrid Search:** Combine semantic search (vector similarity) with BM25 (keyword matching) in single query - **RAG & Reranking:** Built-in generative search and cross-encoder reranking - **Production Ready:** Multi-tenancy, replication, RBAC, horizontal scaling - **Vector Compression:** Reduce memory usage by 97% with quantization **Use Cases:** RAG systems, semantic search, chatbots, content classification, recommendation engines #### **C. Qdrant (Vector Search Engine)** **Repository:** qdrant/qdrant **Status:** Production (written in Rust) **Key Features:** - **Fast & Reliable:** Built in Rust, performs well under heavy load - **Filtering & Payload:** Attach arbitrary JSON payloads, query with complex filters - **Sparse Vectors:** Support for BM25-like keyword search alongside dense vectors - **Vector Quantization:** Trade-off speed/precision with multiple compression levels - **Benchmarked:** Open benchmarks showing competitive ANN performance - **Clients:** Python, Go, Rust, JavaScript, Java, .NET, PHP, Ruby **Production Deployment:** Docker, Kubernetes, Qdrant Cloud (managed service) #### **D. Pinecone (Managed Vector Database)** **Status:** Proprietary managed service (free tier available) **Characteristics:** - Serverless scaling with instant indexing - Pod-based pricing, regional deployment - Integrations with LangChain, LlamaIndex, Semantic Kernel - Advanced: hybrid search, namespacing, metadata filtering - Assistant API: Built-in RAG orchestration --- ### 2.2 Graph Database Approaches #### **Knowledge Graph Integration** Graph databases capture relational structure: ``` Entities (Nodes): - Person: "Alice", "Bob" - Concept: "Python programming", "Machine learning" - Event: "Meeting on 2024-12-01", "Project kickoff" Relationships (Edges): - Alice --(knows)--> Bob - Alice --(works_on)--> "Machine learning" - "Machine learning" --(requires)--> "Python programming" ``` **Advantages for AI Memory:** - Multi-hop reasoning support ("Who knows people working on ML?") - Temporal relationships ("What happened after the kickoff?") - Bidirectional queries and path traversal - Contradiction detection through relationship inconsistencies **Integration with Vector Systems:** - Store entity embeddings alongside graph structure - Use vectors for semantic similarity, graphs for explicit relationships - Hybrid queries: find conceptually similar entities that are also graph-connected --- ### 2.3 Hybrid Approaches #### **Vector + Graph Hybrid Pattern** (Mem0, 2025) Combines strengths of both systems: 1. **Initial Extraction:** Convert conversation into structured facts/entities 2. **Vector Embedding:** Embed each entity and relationship 3. **Graph Construction:** Build relationship graph with edge types 4. **Retrieval (Two-stage):** - Stage 1: Vector similarity search (find conceptually related memories) - Stage 2: Graph traversal from results (find connected entities) 5. **Ranking:** Score by vector similarity + graph proximity + temporal relevance **Results:** - Graph memory adds ~2% accuracy improvement - More robust multi-hop reasoning - Better handling of complex relationships #### **File-Based Memory Systems** Lightweight alternative for smaller systems: ``` Memory Structure: memory/ ├── episodes/ │ ├── 2024-12-01_conversation.json │ ├── 2024-12-02_conversation.json │ └── index.jsonl # Embeddings for search ├── summaries/ │ ├── weekly_2024_W49.txt │ └── monthly_2024_12.txt └── facts/ ├── user_preferences.yaml ├── learned_patterns.yaml └── contradiction_log.jsonl ``` **Suitable for:** - Single-user agents - Development/prototyping - Resource-constrained environments - Git-trackable memory (for version control) --- ## 3. RETRIEVAL STRATEGIES ### 3.1 RAG (Retrieval Augmented Generation) Patterns #### **Core RAG Pipeline (Vanilla)** ``` User Query → Embedding → Vector Search → Top-K Results → LLM Context → Response ``` **Standard Implementation:** ```python from llama_index.core import VectorStoreIndex, SimpleDirectoryReader # Index documents documents = SimpleDirectoryReader("data/").load_data() index = VectorStoreIndex.from_documents(documents) # Query query_engine = index.as_query_engine() response = query_engine.query("User question") ``` #### **Advanced RAG Techniques (2024-2025)** ##### **1. Semantic Search + Re-ranking** Multi-stage retrieval for improved relevance: ``` Stage 1: Semantic Search - Convert query to embedding - Find top-50 candidates from vector DB - (Fast, low precision) Stage 2: Re-ranking - Cross-encoder scores all top-50 - Re-rank by semantic similarity - Select top-5 for context - (Slower, high precision) ``` **Libraries:** - LlamaIndex built-in re-ranking modules - Weaviate native re-ranking - BAAI BGE-Reranker models ##### **2. Hybrid Search (Vector + Keyword)** Combines dense and sparse retrieval: ``` Query → Split into: ├─ Vector embedding (semantic) └─ Keywords/BM25 (lexical) Results union with scoring: - Vector matches: scored by cosine similarity - Keyword matches: scored by BM25 - Final rank: weighted combination ``` **Providers:** - Weaviate: Native hybrid search - Qdrant: Vector + sparse vectors - Elasticsearch: Hybrid queries (BM25 + dense vectors) ##### **3. Time-Decay and Recency Weighting** Prioritize recent/relevant memories: ```python # Score calculation final_score = ( semantic_similarity_score * 0.6 + recency_score * 0.3 + # Exponential decay importance_score * 0.1 # User-marked importance ) # Recency decay function recency_score = exp(-lambda * (current_time - memory_timestamp)) ``` **Use Case:** Long conversations where recent context matters more than historical facts. ##### **4. Query Expansion and Fusion** **RAG Fusion Pattern:** - Generate 5+ reformulations of the original query - Retrieve results for each reformulation - Fuse results using reciprocal rank fusion - Benefits: Better coverage, reduced brittleness ``` Query: "How do I optimize Python performance?" ↓ Reformulations: 1. "Python optimization techniques" 2. "Improve Python code performance" 3. "Python speed up profiling" 4. "Memory-efficient Python patterns" 5. "Python performance tuning" ↓ Retrieve and fuse top results ``` --- ### 3.2 Semantic vs. Keyword Search Trade-offs | Aspect | Semantic Search | Keyword Search | |--------|---|---| | **Strength** | Understands meaning, synonyms, paraphrases | Exact matches, technical terms | | **Weakness** | Misses exact terms, needs embeddings | Brittle to variations | | **Cost** | Embedding inference + vector storage | Simple indexing, fast retrieval | | **Use Case** | Conversations, narrative text | Code, structured data, technical docs | | **Combination** | Hybrid (semantic + BM25) achieves best results | Recommended for production | --- ### 3.3 Relevance Scoring and Ranking #### **Multi-Factor Ranking Model** Production systems weight multiple signals: ```python score = ( embedding_similarity * 0.40 + # Semantic match metadata_relevance * 0.20 + # Type/context match temporal_weight * 0.20 + # Recency/importance user_explicit_rating * 0.10 + # User feedback cross_reference_count * 0.10 # Interconnectedness ) ``` **Metadata Relevance Examples:** - Memory type match (episodic vs. semantic) - Source credibility - Conversation turn distance - User-tagged categories **Temporal Weight Examples:** - Exponential decay: weight ∝ exp(-λt) - Piecewise linear: full weight for recent, decay over time - Adaptive: learning decay rate from user behavior --- ## 4. BEST PRACTICES ### 4.1 Memory Structuring for Effective Retrieval #### **A. Chunking Strategies** How to split information for storage: | Strategy | Size | Use Case | Trade-off | |----------|------|----------|-----------| | **Semantic Chunking** | 200-500 tokens | General text | Compute-intensive | | **Sliding Window** | 256-1024 tokens | Preserves context | May repeat info | | **Recursive Chunking** | Hierarchical | Complex docs | Complex implementation | | **Fixed Token** | 512 tokens | Simple baseline | May split mid-sentence | **Best Practice:** - Use semantic chunking for conversational memory - Include metadata: speaker, timestamp, turn number - Overlap chunks by 10-20% to preserve context #### **B. Metadata and Tagging** Richer context improves retrieval: ```json { "id": "mem_001", "content": "User prefers Pytho...", "timestamp": "2024-12-01T14:30:00Z", "metadata": { "type": "user_preference", "category": "technical", "speaker": "user", "conversation_id": "conv_123", "turn_number": 5, "importance": 0.8, "tags": ["python", "performance", "optimization"], "source": "explicit_statement", "confidence": 0.95 }, "embedding": [0.123, -0.456, ...], "related_memories": ["mem_002", "mem_005"] } ``` **Filtering Benefits:** - Reduce retrieval noise - Type-specific searches (e.g., only user preferences) - Time-bounded queries #### **C. Hierarchical Memory Organization** Layer information by abstraction: ``` Level 1 (Atomic Facts) ├─ "User's name is Alice" ├─ "Alice works in AI" └─ "Alice knows Python" Level 2 (Semantic Concepts) ├─ Alice's Profile: │ ├─ Role: ML Engineer │ └─ Skills: Python, ML └─ Conversation Context: ├─ Current Topic: Performance └─ Recent Questions: 5 Level 3 (Synthesized Knowledge) ├─ User Persona: │ ├─ Technical Level: Advanced │ ├─ Interests: Performance optimization │ └─ Communication Style: Direct └─ Interaction Patterns: ├─ Avg Questions per Session: 8 └─ Preferred Answer Format: Code examples ``` --- ### 4.2 Memory Consolidation and Summarization #### **When to Consolidate** Trigger consolidation to prevent memory bloat: **Triggers:** - Time-based: After N hours/days - Size-based: When memory exceeds X tokens - Event-based: After significant context switch - Quality-based: When retrieval precision drops below threshold #### **Consolidation Techniques** ##### **1. Hierarchical Summarization** ``` Raw Conversation: Turn 1: "Hi, I work on Python projects" Turn 2: "Performance is a concern for me" Turn 3: "Specifically, database queries" ↓ (Extract atomic facts) Atomic Facts: - User: Python developer - Concern: Performance - Domain: Database queries ↓ (Abstract to concepts) Consolidated Summary: "Python developer optimizing database query performance" ↓ (Calculate importance) Importance: 0.85 (indicates strong relevance for future queries) ``` ##### **2. Deduplication** Remove redundant memories: ```python def calculate_similarity(mem1, mem2): # Semantic similarity + content overlap embedding_sim = cosine_similarity(mem1.embedding, mem2.embedding) content_overlap = jaccard_similarity(mem1.entities, mem2.entities) return embedding_sim * 0.7 + content_overlap * 0.3 # Merge if similarity > 0.9 if calculate_similarity(mem_a, mem_b) > 0.9: # Keep more recent, update with combined metadata merge_memories(mem_a, mem_b) ``` ##### **3. Extraction of Key Facts** Use LLM to distill information: ```python prompt = f""" Given this conversation excerpt: {conversation_excerpt} Extract: 1. User facts (preferences, constraints, background) 2. Domain facts (technical information, findings) 3. Decisions made (conclusions, agreements) Format as JSON with confidence scores. """ facts = llm.generate(prompt) # Store facts with extracted metadata ``` **Mem0 Implementation:** - 26% accuracy advantage through intelligent consolidation - Dynamic extraction of what matters for future conversations - Automatic deduplication and conflict resolution --- ### 4.3 Handling Memory Updates and Contradictions #### **Update Semantics** Three strategies for updating memories: **1. Append-Only (Immutable)** ``` Memory Timeline: 2024-12-01: "User prefers Python" 2024-12-05: "User switching to Go for systems programming" Query: When retrieving, return latest fact ``` Pros: Preserves history, audit trail Cons: Storage overhead **2. Replace-In-Place (Mutable)** ``` Memory: "User's favorite language: Python → Go" Metadata: { "updated": "2024-12-05", "old_value": "Python", "new_value": "Go" } ``` Pros: Efficient storage Cons: Loses history **3. Versioned (Hybrid)** ``` Memory with versions: v1 (2024-12-01): "prefers Python" v2 (2024-12-05): "prefers Go" (active) Can query specific version or all versions ``` #### **Contradiction Handling** Detecting and resolving conflicts: ```python def detect_contradiction(mem1, mem2): # Check: opposite assertions about same subject if mem1.subject == mem2.subject: sim = embedding_similarity(mem1, mem2) if sim > 0.8 and opposite_assertions(mem1, mem2): return Contradiction( memories=[mem1, mem2], confidence=compute_confidence(mem1, mem2) ) # Resolution strategies: STRATEGIES = { "prefer_recent": lambda c: select(c.memories, by="timestamp", desc=True), "prefer_explicit": lambda c: select(c.memories, where="user_stated"), "merge": lambda c: create_conditional_memory(c.memories), "flag_for_clarification": lambda c: queue_user_clarification(c.memories) } ``` **Best Practice:** - Log all contradictions with timestamps - Request user clarification on important conflicts - Prefer explicit user statements over inferences - Use recency weighted by confidence --- ### 4.4 Scaling Considerations #### **Token Budget Management** Memory size impacts LLM cost and latency: ```python # Token accounting total_tokens = ( context_window_available * 0.5 + # Reserve for response system_prompt_tokens + conversation_history_tokens + retrieved_memory_tokens ) # Dynamic pruning if over budget if total_tokens > MAX_TOKENS: memories = ranked_retrieval(query, top_k=min_sufficient(total_tokens)) ``` **Mem0 Results:** - 90% token reduction vs. full-context baseline - Achieved through intelligent consolidation and selective retrieval #### **Latency Optimization** Multi-tier retrieval for speed: ``` User Query ↓ Tier 1: In-Memory Cache (Recent memories) - Hit rate: ~70% - Latency: <10ms ↓ (Cache miss) Tier 2: Vector DB (Semantic search) - Hit rate: ~25% - Latency: 50-200ms ↓ (No relevant results) Tier 3: Graph DB (Relationship traversal) - Hit rate: ~5% - Latency: 200-500ms ``` **Mem0 Achieved:** - 91% lower p95 latency vs. full-context (seconds to milliseconds) - Through adaptive two-tier memory system #### **Horizontal Scaling** Production considerations: 1. **Memory Sharding:** Partition by user/session for distributed storage 2. **Vector Index Sharding:** Split large indexes across nodes 3. **Caching Layer:** Redis for hot memories 4. **Async Consolidation:** Background processes for summarization 5. **Read Replicas:** Separate read/write paths for scalability --- ## 5. IMPLEMENTATION ROADMAP ### Phase 1: Foundation (Week 1-2) - [ ] Choose vector database (Chroma for dev, Weaviate/Qdrant for prod) - [ ] Implement basic in-context memory with editable blocks - [ ] Set up semantic search with embeddings ### Phase 2: Advanced Retrieval (Week 3-4) - [ ] Add re-ranking layer - [ ] Implement hybrid search (semantic + keyword) - [ ] Add metadata filtering and temporal weighting ### Phase 3: Memory Consolidation (Week 5-6) - [ ] Build LLM-powered summarization pipeline - [ ] Implement deduplication - [ ] Add contradiction detection ### Phase 4: Production Hardening (Week 7-8) - [ ] Add graph memory layer (optional, for multi-hop queries) - [ ] Implement token budget management - [ ] Add observability and logging - [ ] Scale with caching and sharding --- ## 6. RESEARCH SOURCES ### Academic Papers (2024-2025) 1. **MemGPT Foundation** - Packer et al., "MemGPT: Towards LLMs as Operating Systems" (arXiv:2310.08560, 2023-2024) - Introduces hierarchical memory management paradigm 2. **Production-Ready Memory Systems** - Chhikara et al., "Mem0: Building Production-Ready AI Agents with Scalable Long-Term Memory" (arXiv:2504.19413, 2025) - 26% accuracy improvement, 91% latency reduction, 90% token savings 3. **Memory Architecture Innovations** - Le et al., "Stable Hadamard Memory: Revitalizing Memory-Augmented Agents for Reinforcement Learning" (arXiv:2410.10132, 2024) - Theoretical analysis of memory limitations and dynamic memory updates 4. **Episodic Memory Systems** - Chodhary et al., "Efficient Replay Memory Architectures in Multi-Agent RL" (arXiv:2407.16034, 2024) - Dual-memory systems combining semantic + episodic memory with bounds 5. **Memory Representation and Retrieval** - Bärmann et al., "Episodic Memory Verbalization using Hierarchical Representations" (arXiv:2409.17702, 2024) - Tree-like memory structures for life-long robot experience 6. **Cognitive Mapping** - de Tinguy et al., "Learning Dynamic Cognitive Map with Autonomous Navigation" (arXiv:2411.08447, 2024) - Dynamically expanding models adapting to novel contexts 7. **Vision-Language Navigation Memory** - Pan et al., "Planning from Imagination: Episodic Simulation and Episodic Memory for VLN" (arXiv:2412.01857, 2024) - Reality-imagination hybrid memory systems ### Framework Documentation (2024-2025) - **Letta (formerly MemGPT):** https://docs.letta.com - Production platform for stateful agents with memory blocks - **LangChain:** https://docs.langchain.com - Memory modules, chains, agents with LangGraph - **LlamaIndex:** https://docs.llamaindex.ai - Data framework with 300+ integrations for memory and retrieval - **Semantic Kernel:** https://github.com/microsoft/semantic-kernel - Enterprise SDK with multi-agent memory support - **Mem0:** https://mem0.ai - Purpose-built memory layer for AI assistants ### Vector & Graph Databases - **Chroma:** https://www.trychroma.com - **Weaviate:** https://weaviate.io - **Qdrant:** https://qdrant.tech - **Pinecone:** https://www.pinecone.io --- ## 7. KEY TAKEAWAYS 1. **Hierarchical Memory is Essential:** Multi-tier systems (in-context + out-of-context) are now standard, enabling agents to exceed fixed context windows 2. **Vector + Graph Hybrids Win:** Combining dense vector embeddings with explicit relationship graphs provides best coverage (semantic + relational reasoning) 3. **Consolidation is Critical:** 26-91% improvements come from intelligent memory consolidation, not raw storage size 4. **Token Budget Matters Most:** Memory systems must account for LLM cost/latency; selective retrieval beats full context 5. **Production Readiness Requires:** - Metadata and tagging for filtering - Deduplication and contradiction handling - Temporal weighting and recency bias - Monitoring and observability 6. **Framework Choice Depends on Scale:** - **Dev/Prototyping:** LangChain + Chroma (simple, local) - **Production Single-Tenant:** Letta or Mem0 (purpose-built) - **Scale/Multi-Tenant:** Semantic Kernel + Weaviate/Qdrant (enterprise) --- ## Appendix: Quick Reference Implementation ```python from mem0 import Memory from llama_index.core import VectorStoreIndex, Settings from langchain.memory import ConversationSummaryMemory # Three approaches: # 1. Mem0 (Simplest for production) memory = Memory() memory.add("User prefers Python and FastAPI", user_id="user_123") memories = memory.search("What languages does user prefer?", user_id="user_123") # 2. LlamaIndex (Most flexible) from llama_index.core import VectorStoreIndex, SimpleDirectoryReader documents = SimpleDirectoryReader("data/").load_data() index = VectorStoreIndex.from_documents(documents) query_engine = index.as_query_engine() response = query_engine.query("User's technical stack?") # 3. LangChain (Best for agents) from langchain.agents import initialize_agent, Tool from langchain.chains.conversation.summary import ConversationSummaryMemory memory = ConversationSummaryMemory( llm=llm, memory_key="chat_history", human_prefix="User", ai_prefix="Assistant" ) agent = initialize_agent( tools=[...], llm=llm, memory=memory, agent="conversational-react-description" ) response = agent.run(input="What do I prefer?") ``` --- **Document Version:** 1.0 **Last Updated:** December 2024 **Research Scope:** 2024-2025 (recent frameworks and papers) **Compilation Date:** Based on authoritative sources (academic papers, official documentation, GitHub repositories)

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/blueman82/recall'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

MEMORY_SYSTEMS_RESEARCH.md•28.6 KiB