# Deep Research: Long-Term Memory Systems for AI Agents and LLMs (2024-2025)
## Executive Summary
This comprehensive research document synthesizes findings from recent academic papers (2024-2025), production-ready frameworks, and architectural innovations in AI agent memory systems. The research covers four main dimensions: memory architectures, storage patterns, retrieval strategies, and best practices for building persistent, scalable memory systems for LLM-powered agents.
**Key Findings:**
- Modern memory systems evolved from the MemGPT/Letta OS-inspired paradigm, introducing hierarchical memory management with multiple tiers
- Vector databases (Chroma, Weaviate, Qdrant) dominate storage, with emerging hybrid vector+graph approaches showing superior performance
- RAG (Retrieval Augmented Generation) patterns have matured into sophisticated multi-stage systems with re-ranking and semantic post-processing
- Production systems (Mem0, Letta, LangChain) demonstrate 26-91% improvements in accuracy, latency, and token efficiency over baseline approaches
---
## 1. MEMORY ARCHITECTURES
### 1.1 Overview of Main Approaches
AI agents employ three complementary memory architecture paradigms:
#### A. **Hierarchical Memory Systems (MemGPT/Letta Paradigm)**
**Source:** Packer et al., "MemGPT: Towards LLMs as Operating Systems" (arXiv:2310.08560)
The foundational approach treats LLM context management like traditional OS virtual memory:
- **In-Context Memory (Fast):** Core facts, ongoing conversations stored in the LLM's limited context window
- **Out-of-Context Memory (Slow):** Long-term persistent storage requiring explicit retrieval
- **Virtual Context Management:** Intelligent swapping between memory tiers based on relevance
**Letta Implementation (formerly MemGPT, 2024-2025):**
```
Memory Hierarchy:
├── Memory Blocks (in-context, editable by agent)
│ ├── Persona: Agent's identity and system instructions
│ ├── Human: User information and preferences
│ └── Custom blocks: Domain-specific persistent data
├── Message History: Full conversation timeline (append-only)
└── Vector Store: Embedded memories for semantic search
```
**Key Capability:** Agents can use tools to explicitly edit/search memory blocks, enabling self-reflection and persistent learning.
#### B. **Multi-Level Memory Architecture (Mem0, 2025)**
**Source:** Chhikara et al., "Mem0: Building Production-Ready AI Agents with Scalable Long-Term Memory" (arXiv:2504.19413)
Extends hierarchical memory with adaptive consolidation and graph-based representations:
- **User-Level Memory:** Cross-session user preferences and long-term profile
- **Session-Level Memory:** Current conversation context and immediate history
- **Agent-Level Memory:** Autonomous learning and internal state management
- **Dynamic Consolidation:** Extracting, deduplicating, and summarizing memories from ongoing interactions
- **Graph Memory Variant:** Capturing relational structures between conversational elements
**Performance Metrics (LOCOMO Benchmark):**
- 26% accuracy improvement over OpenAI's memory implementation
- 91% lower p95 latency vs. full-context approach
- 90%+ token savings without degradation
#### C. **Semantic Memory Systems**
Three distinct memory types for AI systems (adapted from neuroscience):
| Memory Type | Implementation | Use Case |
|---|---|---|
| **Episodic Memory** | Timestamped events, experiences with context | "What happened in our last conversation?" |
| **Semantic Memory** | Facts, concepts, extracted knowledge | "What are this user's preferences?" |
| **Procedural Memory** | Learned strategies, tool usage patterns | "How did I solve this before?" |
**Recent Work (2024):**
- **Episodic Memory Verbalization** (arXiv:2409.17702): Hierarchical tree-like data structures representing raw perception data → abstract events → natural language concepts
- **Dual-Memory Systems** (arXiv:2407.16034): Combining episodic (specific events) + semantic (general patterns) memory with memory growth bounds for scalability
- **Cognitive Mapping** (arXiv:2411.08447): Dynamically expanding representational models that adapt to novel environmental contexts
#### D. **Graph-Based Memory Representations**
Emerging approach combining vector embeddings with relational structure:
```
Graph Memory Structure:
- Nodes: Entities (people, concepts, facts)
- Edges: Relationships (mentions, temporal sequences, dependencies)
- Attributes: Relevance scores, timestamps, confidence levels
```
**Advantages:**
- Captures multi-hop reasoning (A→B→C inference chains)
- Supports temporal queries and recency weighting
- Enables contradiction detection and resolution
- **Performance:** +2% additional improvement when combined with vector memory (Mem0 paper)
---
### 1.2 Framework Implementations
#### **LangChain Memory**
Provides multiple memory types for conversation management:
```python
memory_types = {
"ConversationBufferMemory": "Stores full conversation history",
"ConversationSummaryMemory": "Periodically summarizes conversation",
"ConversationTokenBufferMemory": "Maintains fixed token limit",
"ConversationBufferWindowMemory": "Rolling window of recent turns",
"VectorStoreRetrieverMemory": "Uses semantic search for relevant memories"
}
```
**Integration Pattern:** LangChain + LangGraph enables stateful agent workflows with persistent memory, observable execution paths, and human-in-the-loop checkpoints.
#### **LlamaIndex Memory & Retrieval**
Data framework specifically designed for augmenting LLMs:
- **Index Types:** VectorStoreIndex (semantic), SummaryIndex (summarized), KeywordTableIndex (keyword-based)
- **Retrieval:** Query engines with built-in re-ranking, metadata filtering, and source attribution
- **Memory Modes:**
- Chat History (recent messages)
- Summary (periodically generated overview)
- Hybrid (combining approaches)
**300+ Integration Packages** on LlamaHub for various vector stores, LLMs, and data sources.
#### **Semantic Kernel (Microsoft)**
Enterprise-focused orchestration with memory as first-class component:
- **Memory Stores:** Azure AI Search, Elasticsearch, Chroma, and custom implementations
- **Plugin Ecosystem:** Native functions, prompt templates, OpenAPI specs
- **Multi-Agent Architecture:** Agents with independent memory yet shared knowledge
- **Production Features:** RBAC, observability, stable APIs
#### **Letta (formerly MemGPT)**
Pioneering stateful agent platform:
```python
# Multi-agent shared memory example
shared_block = client.blocks.create(
label="organization",
value="Shared context across all agents"
)
agent1 = client.agents.create(
memoryBlocks=[{"label": "persona", "value": "Agent 1"}],
blockIds=[shared_block.id] # Attached shared memory
)
agent2 = client.agents.create(
memoryBlocks=[{"label": "persona", "value": "Agent 2"}],
blockIds=[shared_block.id] # Same shared block
)
```
**Unique Features:**
- Sleep-time agents (background processing with subconsciousness)
- Explicit memory editing via tools
- Perpetual agents with infinite message history
- Multi-agent coordination via shared memory blocks
---
## 2. STORAGE PATTERNS
### 2.1 Vector Database Approaches
#### **A. Chroma (Embedding Database)**
**Repository:** chroma-core/chroma
**Status:** Production-ready, open-source (Apache 2.0)
```python
import chromadb
# Simple 4-function API
client = chromadb.Client()
collection = client.create_collection("memories")
# Add memories
collection.add(
documents=["User loves coffee", "User works in tech"],
metadatas=[{"source": "profile"}, {"source": "conversation"}],
ids=["mem_1", "mem_2"]
)
# Query with semantic search
results = collection.query(
query_texts=["What are the user's interests?"],
n_results=2,
where={"source": "profile"} # Optional filtering
)
```
**Characteristics:**
- Fully-typed, fully-tested, fully-documented
- Automatic embeddings (Sentence Transformers default, customizable)
- Dev/test/prod with same API
- Feature-rich: queries, filtering, regex, metadata operations
- Integrations: LangChain, LlamaIndex, and others
#### **B. Weaviate (Semantic Search Engine)**
**Repository:** weaviate/weaviate
**Status:** Production (built in Go for speed)
**Advanced Capabilities:**
- **Flexible Vectorization:** Integrated vectorizers (OpenAI, Cohere, HuggingFace, Google) or pre-computed embeddings
- **Hybrid Search:** Combine semantic search (vector similarity) with BM25 (keyword matching) in single query
- **RAG & Reranking:** Built-in generative search and cross-encoder reranking
- **Production Ready:** Multi-tenancy, replication, RBAC, horizontal scaling
- **Vector Compression:** Reduce memory usage by 97% with quantization
**Use Cases:** RAG systems, semantic search, chatbots, content classification, recommendation engines
#### **C. Qdrant (Vector Search Engine)**
**Repository:** qdrant/qdrant
**Status:** Production (written in Rust)
**Key Features:**
- **Fast & Reliable:** Built in Rust, performs well under heavy load
- **Filtering & Payload:** Attach arbitrary JSON payloads, query with complex filters
- **Sparse Vectors:** Support for BM25-like keyword search alongside dense vectors
- **Vector Quantization:** Trade-off speed/precision with multiple compression levels
- **Benchmarked:** Open benchmarks showing competitive ANN performance
- **Clients:** Python, Go, Rust, JavaScript, Java, .NET, PHP, Ruby
**Production Deployment:** Docker, Kubernetes, Qdrant Cloud (managed service)
#### **D. Pinecone (Managed Vector Database)**
**Status:** Proprietary managed service (free tier available)
**Characteristics:**
- Serverless scaling with instant indexing
- Pod-based pricing, regional deployment
- Integrations with LangChain, LlamaIndex, Semantic Kernel
- Advanced: hybrid search, namespacing, metadata filtering
- Assistant API: Built-in RAG orchestration
---
### 2.2 Graph Database Approaches
#### **Knowledge Graph Integration**
Graph databases capture relational structure:
```
Entities (Nodes):
- Person: "Alice", "Bob"
- Concept: "Python programming", "Machine learning"
- Event: "Meeting on 2024-12-01", "Project kickoff"
Relationships (Edges):
- Alice --(knows)--> Bob
- Alice --(works_on)--> "Machine learning"
- "Machine learning" --(requires)--> "Python programming"
```
**Advantages for AI Memory:**
- Multi-hop reasoning support ("Who knows people working on ML?")
- Temporal relationships ("What happened after the kickoff?")
- Bidirectional queries and path traversal
- Contradiction detection through relationship inconsistencies
**Integration with Vector Systems:**
- Store entity embeddings alongside graph structure
- Use vectors for semantic similarity, graphs for explicit relationships
- Hybrid queries: find conceptually similar entities that are also graph-connected
---
### 2.3 Hybrid Approaches
#### **Vector + Graph Hybrid Pattern** (Mem0, 2025)
Combines strengths of both systems:
1. **Initial Extraction:** Convert conversation into structured facts/entities
2. **Vector Embedding:** Embed each entity and relationship
3. **Graph Construction:** Build relationship graph with edge types
4. **Retrieval (Two-stage):**
- Stage 1: Vector similarity search (find conceptually related memories)
- Stage 2: Graph traversal from results (find connected entities)
5. **Ranking:** Score by vector similarity + graph proximity + temporal relevance
**Results:**
- Graph memory adds ~2% accuracy improvement
- More robust multi-hop reasoning
- Better handling of complex relationships
#### **File-Based Memory Systems**
Lightweight alternative for smaller systems:
```
Memory Structure:
memory/
├── episodes/
│ ├── 2024-12-01_conversation.json
│ ├── 2024-12-02_conversation.json
│ └── index.jsonl # Embeddings for search
├── summaries/
│ ├── weekly_2024_W49.txt
│ └── monthly_2024_12.txt
└── facts/
├── user_preferences.yaml
├── learned_patterns.yaml
└── contradiction_log.jsonl
```
**Suitable for:**
- Single-user agents
- Development/prototyping
- Resource-constrained environments
- Git-trackable memory (for version control)
---
## 3. RETRIEVAL STRATEGIES
### 3.1 RAG (Retrieval Augmented Generation) Patterns
#### **Core RAG Pipeline (Vanilla)**
```
User Query → Embedding → Vector Search → Top-K Results → LLM Context → Response
```
**Standard Implementation:**
```python
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
# Index documents
documents = SimpleDirectoryReader("data/").load_data()
index = VectorStoreIndex.from_documents(documents)
# Query
query_engine = index.as_query_engine()
response = query_engine.query("User question")
```
#### **Advanced RAG Techniques (2024-2025)**
##### **1. Semantic Search + Re-ranking**
Multi-stage retrieval for improved relevance:
```
Stage 1: Semantic Search
- Convert query to embedding
- Find top-50 candidates from vector DB
- (Fast, low precision)
Stage 2: Re-ranking
- Cross-encoder scores all top-50
- Re-rank by semantic similarity
- Select top-5 for context
- (Slower, high precision)
```
**Libraries:**
- LlamaIndex built-in re-ranking modules
- Weaviate native re-ranking
- BAAI BGE-Reranker models
##### **2. Hybrid Search (Vector + Keyword)**
Combines dense and sparse retrieval:
```
Query → Split into:
├─ Vector embedding (semantic)
└─ Keywords/BM25 (lexical)
Results union with scoring:
- Vector matches: scored by cosine similarity
- Keyword matches: scored by BM25
- Final rank: weighted combination
```
**Providers:**
- Weaviate: Native hybrid search
- Qdrant: Vector + sparse vectors
- Elasticsearch: Hybrid queries (BM25 + dense vectors)
##### **3. Time-Decay and Recency Weighting**
Prioritize recent/relevant memories:
```python
# Score calculation
final_score = (
semantic_similarity_score * 0.6 +
recency_score * 0.3 + # Exponential decay
importance_score * 0.1 # User-marked importance
)
# Recency decay function
recency_score = exp(-lambda * (current_time - memory_timestamp))
```
**Use Case:** Long conversations where recent context matters more than historical facts.
##### **4. Query Expansion and Fusion**
**RAG Fusion Pattern:**
- Generate 5+ reformulations of the original query
- Retrieve results for each reformulation
- Fuse results using reciprocal rank fusion
- Benefits: Better coverage, reduced brittleness
```
Query: "How do I optimize Python performance?"
↓
Reformulations:
1. "Python optimization techniques"
2. "Improve Python code performance"
3. "Python speed up profiling"
4. "Memory-efficient Python patterns"
5. "Python performance tuning"
↓
Retrieve and fuse top results
```
---
### 3.2 Semantic vs. Keyword Search Trade-offs
| Aspect | Semantic Search | Keyword Search |
|--------|---|---|
| **Strength** | Understands meaning, synonyms, paraphrases | Exact matches, technical terms |
| **Weakness** | Misses exact terms, needs embeddings | Brittle to variations |
| **Cost** | Embedding inference + vector storage | Simple indexing, fast retrieval |
| **Use Case** | Conversations, narrative text | Code, structured data, technical docs |
| **Combination** | Hybrid (semantic + BM25) achieves best results | Recommended for production |
---
### 3.3 Relevance Scoring and Ranking
#### **Multi-Factor Ranking Model**
Production systems weight multiple signals:
```python
score = (
embedding_similarity * 0.40 + # Semantic match
metadata_relevance * 0.20 + # Type/context match
temporal_weight * 0.20 + # Recency/importance
user_explicit_rating * 0.10 + # User feedback
cross_reference_count * 0.10 # Interconnectedness
)
```
**Metadata Relevance Examples:**
- Memory type match (episodic vs. semantic)
- Source credibility
- Conversation turn distance
- User-tagged categories
**Temporal Weight Examples:**
- Exponential decay: weight ∝ exp(-λt)
- Piecewise linear: full weight for recent, decay over time
- Adaptive: learning decay rate from user behavior
---
## 4. BEST PRACTICES
### 4.1 Memory Structuring for Effective Retrieval
#### **A. Chunking Strategies**
How to split information for storage:
| Strategy | Size | Use Case | Trade-off |
|----------|------|----------|-----------|
| **Semantic Chunking** | 200-500 tokens | General text | Compute-intensive |
| **Sliding Window** | 256-1024 tokens | Preserves context | May repeat info |
| **Recursive Chunking** | Hierarchical | Complex docs | Complex implementation |
| **Fixed Token** | 512 tokens | Simple baseline | May split mid-sentence |
**Best Practice:**
- Use semantic chunking for conversational memory
- Include metadata: speaker, timestamp, turn number
- Overlap chunks by 10-20% to preserve context
#### **B. Metadata and Tagging**
Richer context improves retrieval:
```json
{
"id": "mem_001",
"content": "User prefers Pytho...",
"timestamp": "2024-12-01T14:30:00Z",
"metadata": {
"type": "user_preference",
"category": "technical",
"speaker": "user",
"conversation_id": "conv_123",
"turn_number": 5,
"importance": 0.8,
"tags": ["python", "performance", "optimization"],
"source": "explicit_statement",
"confidence": 0.95
},
"embedding": [0.123, -0.456, ...],
"related_memories": ["mem_002", "mem_005"]
}
```
**Filtering Benefits:**
- Reduce retrieval noise
- Type-specific searches (e.g., only user preferences)
- Time-bounded queries
#### **C. Hierarchical Memory Organization**
Layer information by abstraction:
```
Level 1 (Atomic Facts)
├─ "User's name is Alice"
├─ "Alice works in AI"
└─ "Alice knows Python"
Level 2 (Semantic Concepts)
├─ Alice's Profile:
│ ├─ Role: ML Engineer
│ └─ Skills: Python, ML
└─ Conversation Context:
├─ Current Topic: Performance
└─ Recent Questions: 5
Level 3 (Synthesized Knowledge)
├─ User Persona:
│ ├─ Technical Level: Advanced
│ ├─ Interests: Performance optimization
│ └─ Communication Style: Direct
└─ Interaction Patterns:
├─ Avg Questions per Session: 8
└─ Preferred Answer Format: Code examples
```
---
### 4.2 Memory Consolidation and Summarization
#### **When to Consolidate**
Trigger consolidation to prevent memory bloat:
**Triggers:**
- Time-based: After N hours/days
- Size-based: When memory exceeds X tokens
- Event-based: After significant context switch
- Quality-based: When retrieval precision drops below threshold
#### **Consolidation Techniques**
##### **1. Hierarchical Summarization**
```
Raw Conversation:
Turn 1: "Hi, I work on Python projects"
Turn 2: "Performance is a concern for me"
Turn 3: "Specifically, database queries"
↓ (Extract atomic facts)
Atomic Facts:
- User: Python developer
- Concern: Performance
- Domain: Database queries
↓ (Abstract to concepts)
Consolidated Summary:
"Python developer optimizing database query performance"
↓ (Calculate importance)
Importance: 0.85 (indicates strong relevance for future queries)
```
##### **2. Deduplication**
Remove redundant memories:
```python
def calculate_similarity(mem1, mem2):
# Semantic similarity + content overlap
embedding_sim = cosine_similarity(mem1.embedding, mem2.embedding)
content_overlap = jaccard_similarity(mem1.entities, mem2.entities)
return embedding_sim * 0.7 + content_overlap * 0.3
# Merge if similarity > 0.9
if calculate_similarity(mem_a, mem_b) > 0.9:
# Keep more recent, update with combined metadata
merge_memories(mem_a, mem_b)
```
##### **3. Extraction of Key Facts**
Use LLM to distill information:
```python
prompt = f"""
Given this conversation excerpt:
{conversation_excerpt}
Extract:
1. User facts (preferences, constraints, background)
2. Domain facts (technical information, findings)
3. Decisions made (conclusions, agreements)
Format as JSON with confidence scores.
"""
facts = llm.generate(prompt)
# Store facts with extracted metadata
```
**Mem0 Implementation:**
- 26% accuracy advantage through intelligent consolidation
- Dynamic extraction of what matters for future conversations
- Automatic deduplication and conflict resolution
---
### 4.3 Handling Memory Updates and Contradictions
#### **Update Semantics**
Three strategies for updating memories:
**1. Append-Only (Immutable)**
```
Memory Timeline:
2024-12-01: "User prefers Python"
2024-12-05: "User switching to Go for systems programming"
Query: When retrieving, return latest fact
```
Pros: Preserves history, audit trail
Cons: Storage overhead
**2. Replace-In-Place (Mutable)**
```
Memory: "User's favorite language: Python → Go"
Metadata: {
"updated": "2024-12-05",
"old_value": "Python",
"new_value": "Go"
}
```
Pros: Efficient storage
Cons: Loses history
**3. Versioned (Hybrid)**
```
Memory with versions:
v1 (2024-12-01): "prefers Python"
v2 (2024-12-05): "prefers Go" (active)
Can query specific version or all versions
```
#### **Contradiction Handling**
Detecting and resolving conflicts:
```python
def detect_contradiction(mem1, mem2):
# Check: opposite assertions about same subject
if mem1.subject == mem2.subject:
sim = embedding_similarity(mem1, mem2)
if sim > 0.8 and opposite_assertions(mem1, mem2):
return Contradiction(
memories=[mem1, mem2],
confidence=compute_confidence(mem1, mem2)
)
# Resolution strategies:
STRATEGIES = {
"prefer_recent": lambda c: select(c.memories, by="timestamp", desc=True),
"prefer_explicit": lambda c: select(c.memories, where="user_stated"),
"merge": lambda c: create_conditional_memory(c.memories),
"flag_for_clarification": lambda c: queue_user_clarification(c.memories)
}
```
**Best Practice:**
- Log all contradictions with timestamps
- Request user clarification on important conflicts
- Prefer explicit user statements over inferences
- Use recency weighted by confidence
---
### 4.4 Scaling Considerations
#### **Token Budget Management**
Memory size impacts LLM cost and latency:
```python
# Token accounting
total_tokens = (
context_window_available * 0.5 + # Reserve for response
system_prompt_tokens +
conversation_history_tokens +
retrieved_memory_tokens
)
# Dynamic pruning if over budget
if total_tokens > MAX_TOKENS:
memories = ranked_retrieval(query, top_k=min_sufficient(total_tokens))
```
**Mem0 Results:**
- 90% token reduction vs. full-context baseline
- Achieved through intelligent consolidation and selective retrieval
#### **Latency Optimization**
Multi-tier retrieval for speed:
```
User Query
↓
Tier 1: In-Memory Cache (Recent memories)
- Hit rate: ~70%
- Latency: <10ms
↓ (Cache miss)
Tier 2: Vector DB (Semantic search)
- Hit rate: ~25%
- Latency: 50-200ms
↓ (No relevant results)
Tier 3: Graph DB (Relationship traversal)
- Hit rate: ~5%
- Latency: 200-500ms
```
**Mem0 Achieved:**
- 91% lower p95 latency vs. full-context (seconds to milliseconds)
- Through adaptive two-tier memory system
#### **Horizontal Scaling**
Production considerations:
1. **Memory Sharding:** Partition by user/session for distributed storage
2. **Vector Index Sharding:** Split large indexes across nodes
3. **Caching Layer:** Redis for hot memories
4. **Async Consolidation:** Background processes for summarization
5. **Read Replicas:** Separate read/write paths for scalability
---
## 5. IMPLEMENTATION ROADMAP
### Phase 1: Foundation (Week 1-2)
- [ ] Choose vector database (Chroma for dev, Weaviate/Qdrant for prod)
- [ ] Implement basic in-context memory with editable blocks
- [ ] Set up semantic search with embeddings
### Phase 2: Advanced Retrieval (Week 3-4)
- [ ] Add re-ranking layer
- [ ] Implement hybrid search (semantic + keyword)
- [ ] Add metadata filtering and temporal weighting
### Phase 3: Memory Consolidation (Week 5-6)
- [ ] Build LLM-powered summarization pipeline
- [ ] Implement deduplication
- [ ] Add contradiction detection
### Phase 4: Production Hardening (Week 7-8)
- [ ] Add graph memory layer (optional, for multi-hop queries)
- [ ] Implement token budget management
- [ ] Add observability and logging
- [ ] Scale with caching and sharding
---
## 6. RESEARCH SOURCES
### Academic Papers (2024-2025)
1. **MemGPT Foundation**
- Packer et al., "MemGPT: Towards LLMs as Operating Systems" (arXiv:2310.08560, 2023-2024)
- Introduces hierarchical memory management paradigm
2. **Production-Ready Memory Systems**
- Chhikara et al., "Mem0: Building Production-Ready AI Agents with Scalable Long-Term Memory" (arXiv:2504.19413, 2025)
- 26% accuracy improvement, 91% latency reduction, 90% token savings
3. **Memory Architecture Innovations**
- Le et al., "Stable Hadamard Memory: Revitalizing Memory-Augmented Agents for Reinforcement Learning" (arXiv:2410.10132, 2024)
- Theoretical analysis of memory limitations and dynamic memory updates
4. **Episodic Memory Systems**
- Chodhary et al., "Efficient Replay Memory Architectures in Multi-Agent RL" (arXiv:2407.16034, 2024)
- Dual-memory systems combining semantic + episodic memory with bounds
5. **Memory Representation and Retrieval**
- Bärmann et al., "Episodic Memory Verbalization using Hierarchical Representations" (arXiv:2409.17702, 2024)
- Tree-like memory structures for life-long robot experience
6. **Cognitive Mapping**
- de Tinguy et al., "Learning Dynamic Cognitive Map with Autonomous Navigation" (arXiv:2411.08447, 2024)
- Dynamically expanding models adapting to novel contexts
7. **Vision-Language Navigation Memory**
- Pan et al., "Planning from Imagination: Episodic Simulation and Episodic Memory for VLN" (arXiv:2412.01857, 2024)
- Reality-imagination hybrid memory systems
### Framework Documentation (2024-2025)
- **Letta (formerly MemGPT):** https://docs.letta.com
- Production platform for stateful agents with memory blocks
- **LangChain:** https://docs.langchain.com
- Memory modules, chains, agents with LangGraph
- **LlamaIndex:** https://docs.llamaindex.ai
- Data framework with 300+ integrations for memory and retrieval
- **Semantic Kernel:** https://github.com/microsoft/semantic-kernel
- Enterprise SDK with multi-agent memory support
- **Mem0:** https://mem0.ai
- Purpose-built memory layer for AI assistants
### Vector & Graph Databases
- **Chroma:** https://www.trychroma.com
- **Weaviate:** https://weaviate.io
- **Qdrant:** https://qdrant.tech
- **Pinecone:** https://www.pinecone.io
---
## 7. KEY TAKEAWAYS
1. **Hierarchical Memory is Essential:** Multi-tier systems (in-context + out-of-context) are now standard, enabling agents to exceed fixed context windows
2. **Vector + Graph Hybrids Win:** Combining dense vector embeddings with explicit relationship graphs provides best coverage (semantic + relational reasoning)
3. **Consolidation is Critical:** 26-91% improvements come from intelligent memory consolidation, not raw storage size
4. **Token Budget Matters Most:** Memory systems must account for LLM cost/latency; selective retrieval beats full context
5. **Production Readiness Requires:**
- Metadata and tagging for filtering
- Deduplication and contradiction handling
- Temporal weighting and recency bias
- Monitoring and observability
6. **Framework Choice Depends on Scale:**
- **Dev/Prototyping:** LangChain + Chroma (simple, local)
- **Production Single-Tenant:** Letta or Mem0 (purpose-built)
- **Scale/Multi-Tenant:** Semantic Kernel + Weaviate/Qdrant (enterprise)
---
## Appendix: Quick Reference Implementation
```python
from mem0 import Memory
from llama_index.core import VectorStoreIndex, Settings
from langchain.memory import ConversationSummaryMemory
# Three approaches:
# 1. Mem0 (Simplest for production)
memory = Memory()
memory.add("User prefers Python and FastAPI", user_id="user_123")
memories = memory.search("What languages does user prefer?", user_id="user_123")
# 2. LlamaIndex (Most flexible)
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
documents = SimpleDirectoryReader("data/").load_data()
index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine()
response = query_engine.query("User's technical stack?")
# 3. LangChain (Best for agents)
from langchain.agents import initialize_agent, Tool
from langchain.chains.conversation.summary import ConversationSummaryMemory
memory = ConversationSummaryMemory(
llm=llm,
memory_key="chat_history",
human_prefix="User",
ai_prefix="Assistant"
)
agent = initialize_agent(
tools=[...],
llm=llm,
memory=memory,
agent="conversational-react-description"
)
response = agent.run(input="What do I prefer?")
```
---
**Document Version:** 1.0
**Last Updated:** December 2024
**Research Scope:** 2024-2025 (recent frameworks and papers)
**Compilation Date:** Based on authoritative sources (academic papers, official documentation, GitHub repositories)