Recall is a long-term memory system for AI assistants that provides persistent storage, semantic search, and relationship tracking for memories.
Core Memory Operations
Store memories with automatic semantic indexing, content-hash deduplication, and optional auto-linking to related memories
Search memories using natural language queries with semantic similarity, filters (namespace, type, importance), and optional multi-hop graph expansion
Delete memories by ID or semantic search, with protection for high-confidence "golden rule" memories
Count and list memories with filtering, sorting, and pagination for auditing and exploration
Generate context by fetching relevant memories formatted as markdown for session injection, respecting token budgets
Memory Relationships & Graph
Create relationships between memories (relates_to, supersedes, caused_by, contradicts)
Inspect graph structure with BFS traversal, configurable depth/direction, and Mermaid diagram generation
Delete edges between memories by ID, memory connection, or specific pairs
Auto-infer relationships using embedding similarity with optional LLM refinement
Validation & Quality
Validate memories by recording application success/failure to adjust confidence scores automatically
Detect contradictions between memories using semantic search and LLM reasoning
Check for superseding memories based on validation history to identify outdated information
Analyze memory health to detect contradictions, low-confidence, and stale memories
View validation history showing applied/succeeded/failed events and confidence score evolution
Performance & Monitoring
Check daemon status to monitor the async embedding service for fast storage (<10ms)
Track file activity to record file access events (read, write, edit) and view recent activity statistics
Key Features
Namespace isolation (global vs project-scoped)
Importance scoring (0.0-1.0) for memory prioritization
Confidence-based promotion to "golden rule" status (auto-promoted at 0.9)
Fast path via daemon (<10ms) or sync fallback (MLX ~100ms, Ollama 10-60s)
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@Recallsearch for my preferences about user interface design"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
Recall
Long-term memory system for MCP-compatible AI assistants with semantic search and relationship tracking.
Features
Persistent Memory Storage: Store preferences, decisions, patterns, and session context
Semantic Search: Find relevant memories using natural language queries via ChromaDB vectors
MLX Hybrid Embeddings: Native Apple Silicon support via MLX for ~5-10x faster embeddings (automatic fallback to Ollama)
Memory Relationships: Create edges between memories (supersedes, relates_to, caused_by, contradicts)
Namespace Isolation: Global memories vs project-scoped memories
Context Generation: Auto-format memories for session context injection
Deduplication: Content-hash based duplicate detection
Installation
# Clone the repository
git clone https://github.com/yourorg/recall.git
cd recall
# Install with uv
uv sync
# On Apple Silicon: MLX embeddings work automatically (fastest option)
# On other platforms or as fallback: ensure Ollama is running
ollama pull mxbai-embed-large # Required if not using MLX
ollama pull llama3.2 # Optional: session summarization for auto-capture hook
ollama serveUsage
Run as MCP Server
uv run python -m recallCLI Options
uv run python -m recall --help
Options:
--sqlite-path PATH SQLite database path (default: ~/.recall/recall.db)
--chroma-path PATH ChromaDB storage path (default: ~/.recall/chroma_db)
--collection NAME ChromaDB collection name (default: memories)
--ollama-host HOST Ollama server URL (default: http://localhost:11434)
--ollama-model MODEL Embedding model (default: mxbai-embed-large)
--ollama-timeout SECS Request timeout (default: 30)
--log-level LEVEL DEBUG, INFO, WARNING, ERROR, CRITICAL (default: INFO)meta-mcp Configuration
Add Recall to your meta-mcp servers.json:
{
"recall": {
"command": "uv",
"args": [
"run",
"--directory",
"/path/to/recall",
"python",
"-m",
"recall"
],
"env": {
"RECALL_LOG_LEVEL": "INFO",
"RECALL_OLLAMA_HOST": "http://localhost:11434",
"RECALL_OLLAMA_MODEL": "mxbai-embed-large"
},
"description": "Long-term memory system with semantic search",
"tags": ["memory", "context", "semantic-search"]
}
}Or for Claude Code / other MCP clients (claude.json):
{
"mcpServers": {
"recall": {
"command": "uv",
"args": [
"run",
"--directory",
"/path/to/recall",
"python",
"-m",
"recall"
],
"env": {
"RECALL_LOG_LEVEL": "INFO"
}
}
}
}Environment Variables
Variable | Default | Description |
|
| SQLite database file path |
|
| ChromaDB persistent storage directory |
|
| ChromaDB collection name |
|
| Embedding backend: |
|
| MLX embedding model identifier |
|
| Ollama server URL |
|
| Ollama embedding model name |
|
| Ollama request timeout in seconds |
|
| Logging level (DEBUG, INFO, WARNING, ERROR, CRITICAL) |
|
| Default namespace for memories |
|
| Default importance score (0.0-1.0) |
|
| Default token budget for context |
MCP Tool Examples
memory_store_tool
Store a new memory with semantic indexing. Uses fast daemon path when available (<10ms), falls back to sync embedding otherwise.
{
"content": "User prefers dark mode in all applications",
"memory_type": "preference",
"namespace": "global",
"importance": 0.8,
"metadata": {"source": "explicit_request"}
}Response (fast path via daemon):
{
"success": true,
"queued": true,
"queue_id": 42,
"namespace": "global"
}Response (sync path fallback):
{
"success": true,
"queued": false,
"id": "550e8400-e29b-41d4-a716-446655440000",
"content_hash": "a1b2c3d4e5f67890"
}daemon_status_tool
Check if the recall daemon is running:
{}Response:
{
"running": true,
"status": {
"pid": 12345,
"store_queue": {"pending_count": 5},
"embed_worker_running": true
}
}memory_recall_tool
Search memories by semantic similarity:
{
"query": "user interface preferences",
"n_results": 5,
"namespace": "global",
"memory_type": "preference",
"min_importance": 0.5,
"include_related": true
}Response:
{
"success": true,
"memories": [
{
"id": "550e8400-e29b-41d4-a716-446655440000",
"content": "User prefers dark mode in all applications",
"type": "preference",
"namespace": "global",
"importance": 0.8,
"created_at": "2024-01-15T10:30:00",
"accessed_at": "2024-01-15T14:22:00",
"access_count": 3
}
],
"total": 1,
"score": 0.92
}memory_relate_tool
Create a relationship between memories:
{
"source_id": "mem_new_123",
"target_id": "mem_old_456",
"relation": "supersedes",
"weight": 1.0
}Response:
{
"success": true,
"edge_id": 42
}memory_context_tool
Generate formatted context for session injection:
{
"query": "coding style preferences",
"project": "myproject",
"token_budget": 4000
}Response:
{
"success": true,
"context": "# Memory Context\n\n## Preferences\n\n- User prefers dark mode [global]\n- Use 2-space indentation [project:myproject]\n\n## Recent Decisions\n\n- Decided to use FastAPI for the backend [project:myproject]\n",
"token_estimate": 125
}memory_forget_tool
Delete memories by ID or semantic search:
{
"memory_id": "550e8400-e29b-41d4-a716-446655440000",
"confirm": true
}Or delete by search:
{
"query": "outdated preferences",
"namespace": "project:oldproject",
"n_results": 10,
"confirm": true
}Response:
{
"success": true,
"deleted_ids": ["550e8400-e29b-41d4-a716-446655440000"],
"deleted_count": 1
}Architecture
┌─────────────────────────────────────────────────────────────┐
│ MCP Server (FastMCP) │
│ memory_store │ memory_recall │ memory_relate │ memory_forget │
└───────────────────────────┬─────────────────────────────────┘
│
┌─────────────┴─────────────┐
│ │
┌─────────▼─────────┐ ┌─────────▼─────────┐
│ FAST PATH │ │ SYNC PATH │
│ <10ms │ │ MLX: <100ms │
└─────────┬─────────┘ │ Ollama: 10-60s │
│ └─────────┬─────────┘
┌─────────▼─────────┐ │
│ recall-daemon │ ┌─────────▼─────────┐
│ (Unix socket) │ │ HybridStore │
│ │ └─────────┬─────────┘
│ ┌─────────────┐ │ │
│ │ StoreQueue │ │ ┌───────────┼───────────┐
│ │ EmbedWorker │ │ │ │ │
│ └─────────────┘ │ │ │ │
└─────────┬─────────┘ ┌─▼─────┐ ┌───▼───┐ ┌─────▼─────┐
│ │SQLite │ │Chroma │ │ Embedding │
└─────────────►Store │ │ Store │ │ Factory │
└───────┘ └───────┘ └─────┬─────┘
│
┌───────────┴───────────┐
│ │
┌─────▼─────┐ ┌─────▼─────┐
│ MLX │ │ Ollama │
│ (Apple) │ │ (Fallback)│
└───────────┘ └───────────┘The daemon provides fast (<10ms) memory storage by queueing operations and processing embeddings asynchronously. When the daemon is unavailable, the MCP server falls back to synchronous embedding via MLX (~100ms on Apple Silicon) or Ollama (10-60s on other platforms).
Daemon Setup (macOS)
The recall daemon provides fast (<10ms) memory storage by processing embeddings asynchronously. Without the daemon, each store operation blocks for 10-60 seconds waiting for Ollama embeddings.
Quick Install
# From the recall directory
./hooks/install-daemon.shThis will:
Copy hook scripts to
~/.claude/hooks/Install the launchd plist to
~/Library/LaunchAgents/Start the daemon automatically
Manual Install
# 1. Copy hook scripts
cp hooks/recall*.py ~/.claude/hooks/
chmod +x ~/.claude/hooks/recall*.py
# 2. Create logs directory
mkdir -p ~/.claude/hooks/logs
# 3. Install plist with path substitution
sed "s|{{HOME}}|$HOME|g; s|{{RECALL_DIR}}|$(pwd)|g" \
hooks/com.recall.daemon.plist.template > ~/Library/LaunchAgents/com.recall.daemon.plist
# 4. Load the daemon
launchctl load ~/Library/LaunchAgents/com.recall.daemon.plistDaemon Commands
# Check status
echo '{"cmd": "status"}' | nc -U /tmp/recall-daemon.sock | jq
# Stop daemon
launchctl unload ~/Library/LaunchAgents/com.recall.daemon.plist
# Start daemon
launchctl load ~/Library/LaunchAgents/com.recall.daemon.plist
# View logs
tail -f ~/.claude/hooks/logs/recall-daemon.logHooks Configuration
Add recall hooks to your Claude Code settings (~/.claude/settings.json). See hooks/settings.example.json for the full configuration.
Development
# Install dev dependencies
uv sync --dev
# Run tests
uv run pytest tests/
# Run tests with coverage
uv run pytest tests/ --cov=recall --cov-report=html
# Type checking
uv run mypy src/recall
# Run specific integration tests
uv run pytest tests/integration/test_mcp_server.py -vRequirements
Python 3.13+
For Apple Silicon (recommended): MLX embeddings work automatically with
mlx-embeddingspackageFor other platforms: Ollama with:
mxbai-embed-largemodel (required for semantic search)llama3.2model (optional, for session auto-capture hook)
~500MB disk space for ChromaDB indices
License
MIT