Recall is a long-term memory system for AI assistants that provides persistent storage, semantic search, and relationship tracking for memories.
Core Memory Operations
Store memories with automatic semantic indexing, content-hash deduplication, and optional auto-linking to related memories
Search memories using natural language queries with semantic similarity, filters (namespace, type, importance), and optional multi-hop graph expansion
Delete memories by ID or semantic search, with protection for high-confidence "golden rule" memories
Count and list memories with filtering, sorting, and pagination for auditing and exploration
Generate context by fetching relevant memories formatted as markdown for session injection, respecting token budgets
Memory Relationships & Graph
Create relationships between memories (relates_to, supersedes, caused_by, contradicts)
Inspect graph structure with BFS traversal, configurable depth/direction, and Mermaid diagram generation
Delete edges between memories by ID, memory connection, or specific pairs
Auto-infer relationships using embedding similarity with optional LLM refinement
Validation & Quality
Validate memories by recording application success/failure to adjust confidence scores automatically
Detect contradictions between memories using semantic search and LLM reasoning
Check for superseding memories based on validation history to identify outdated information
Analyze memory health to detect contradictions, low-confidence, and stale memories
View validation history showing applied/succeeded/failed events and confidence score evolution
Performance & Monitoring
Check daemon status to monitor the async embedding service for fast storage (<10ms)
Track file activity to record file access events (read, write, edit) and view recent activity statistics
Key Features
Namespace isolation (global vs project-scoped)
Importance scoring (0.0-1.0) for memory prioritization
Confidence-based promotion to "golden rule" status (auto-promoted at 0.9)
Fast path via daemon (<10ms) or sync fallback (MLX ~100ms, Ollama 10-60s)
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@Recallsearch for my preferences about user interface design"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
Recall
Long-term memory system for MCP-compatible AI assistants with semantic search and relationship tracking.
Features
Persistent Memory Storage: Store preferences, decisions, patterns, and session context
Semantic Search: Find relevant memories using natural language queries via ChromaDB vectors
MLX Hybrid Embeddings: Native Apple Silicon support via MLX for ~5-10x faster embeddings (automatic fallback to Ollama)
Memory Relationships: Create edges between memories (supersedes, relates_to, caused_by, contradicts)
Namespace Isolation: Global memories vs project-scoped memories
Context Generation: Auto-format memories for session context injection
Deduplication: Content-hash based duplicate detection
Installation
Usage
Run as MCP Server
CLI Options
meta-mcp Configuration
Add Recall to your meta-mcp servers.json:
Or for Claude Code / other MCP clients (claude.json):
Environment Variables
Variable | Default | Description |
|
| SQLite database file path |
|
| ChromaDB persistent storage directory |
|
| ChromaDB collection name |
|
| Embedding backend: |
|
| MLX embedding model identifier |
|
| Ollama server URL |
|
| Ollama embedding model name |
|
| Ollama request timeout in seconds |
|
| Logging level (DEBUG, INFO, WARNING, ERROR, CRITICAL) |
|
| Default namespace for memories |
|
| Default importance score (0.0-1.0) |
|
| Default token budget for context |
MCP Tool Examples
memory_store_tool
Store a new memory with semantic indexing. Uses fast daemon path when available (<10ms), falls back to sync embedding otherwise.
Response (fast path via daemon):
Response (sync path fallback):
daemon_status_tool
Check if the recall daemon is running:
Response:
memory_recall_tool
Search memories by semantic similarity:
Response:
memory_relate_tool
Create a relationship between memories:
Response:
memory_context_tool
Generate formatted context for session injection:
Response:
memory_forget_tool
Delete memories by ID or semantic search:
Or delete by search:
Response:
Architecture
The daemon provides fast (<10ms) memory storage by queueing operations and processing embeddings asynchronously. When the daemon is unavailable, the MCP server falls back to synchronous embedding via MLX (~100ms on Apple Silicon) or Ollama (10-60s on other platforms).
Daemon Setup (macOS)
The recall daemon provides fast (<10ms) memory storage by processing embeddings asynchronously. Without the daemon, each store operation blocks for 10-60 seconds waiting for Ollama embeddings.
Quick Install
This will:
Copy hook scripts to
~/.claude/hooks/Install the launchd plist to
~/Library/LaunchAgents/Start the daemon automatically
Manual Install
Daemon Commands
Hooks Configuration
Add recall hooks to your Claude Code settings (~/.claude/settings.json). See hooks/settings.example.json for the full configuration.
Development
Requirements
Python 3.13+
For Apple Silicon (recommended): MLX embeddings work automatically with
mlx-embeddingspackageFor other platforms: Ollama with:
mxbai-embed-largemodel (required for semantic search)llama3.2model (optional, for session auto-capture hook)
~500MB disk space for ChromaDB indices
License
MIT