What can you do with this server?

Recall is a long-term memory system for AI assistants that provides persistent storage, semantic search, and relationship tracking for memories. Core Memory Operations * Store memories with automatic semantic indexing, content-hash deduplication, and optional auto-linking to related memories * Search memories using natural language queries with semantic similarity, filters (namespace, type, importance), and optional multi-hop graph expansion * Delete memories by ID or semantic search, with protection for high-confidence "golden rule" memories * Count and list memories with filtering, sorting, and pagination for auditing and exploration * Generate context by fetching relevant memories formatted as markdown for session injection, respecting token budgets Memory Relationships & Graph * Create relationships between memories (relates_to, supersedes, caused_by, contradicts) * Inspect graph structure with BFS traversal, configurable depth/direction, and Mermaid diagram generation * Delete edges between memories by ID, memory connection, or specific pairs * Auto-infer relationships using embedding similarity with optional LLM refinement Validation & Quality * Validate memories by recording application success/failure to adjust confidence scores automatically * Detect contradictions between memories using semantic search and LLM reasoning * Check for superseding memories based on validation history to identify outdated information * Analyze memory health to detect contradictions, low-confidence, and stale memories * View validation history showing applied/succeeded/failed events and confidence score evolution Performance & Monitoring * Check daemon status to monitor the async embedding service for fast storage (<10ms) * Track file activity to record file access events (read, write, edit) and view recent activity statistics Key Features * Namespace isolation (global vs project-scoped) * Importance scoring (0.0-1.0) for memory prioritization * Confidence-based promotion to "golden rule" status (auto-promoted at 0.9) * Fast path via daemon (<10ms) or sync fallback (MLX ~100ms, Ollama 10-60s)

Which integrations are available for this server?

Uses Ollama for generating semantic embeddings (via mxbai-embed-large model) to enable vector-based memory search and optional session summarization (via llama3.2 model). Stores memory metadata, relationships, and structured data in SQLite for persistent memory management and retrieval.

1. Click on "Install Server". 2. Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state. 3. In the chat, type @ followed by the MCP server name and your instructions, e.g., "@Recall search for my preferences about user interface design" That's it! The server will respond to your query, and you can continue using it as needed. Here is a step-by-step guide with screenshots.

Recall is a long-term memory system for AI assistants that provides persistent storage, semantic search, and relationship tracking for memories.

Core Memory Operations

Store memories with automatic semantic indexing, content-hash deduplication, and optional auto-linking to related memories
Search memories using natural language queries with semantic similarity, filters (namespace, type, importance), and optional multi-hop graph expansion
Delete memories by ID or semantic search, with protection for high-confidence "golden rule" memories
Count and list memories with filtering, sorting, and pagination for auditing and exploration
Generate context by fetching relevant memories formatted as markdown for session injection, respecting token budgets

Memory Relationships & Graph

Create relationships between memories (relates_to, supersedes, caused_by, contradicts)
Inspect graph structure with BFS traversal, configurable depth/direction, and Mermaid diagram generation
Delete edges between memories by ID, memory connection, or specific pairs
Auto-infer relationships using embedding similarity with optional LLM refinement

Validation & Quality

Validate memories by recording application success/failure to adjust confidence scores automatically
Detect contradictions between memories using semantic search and LLM reasoning
Check for superseding memories based on validation history to identify outdated information
Analyze memory health to detect contradictions, low-confidence, and stale memories
View validation history showing applied/succeeded/failed events and confidence score evolution

Performance & Monitoring

Check daemon status to monitor the async embedding service for fast storage (<10ms)
Track file activity to record file access events (read, write, edit) and view recent activity statistics

Key Features

Namespace isolation (global vs project-scoped)
Importance scoring (0.0-1.0) for memory prioritization
Confidence-based promotion to "golden rule" status (auto-promoted at 0.9)
Fast path via daemon (<10ms) or sync fallback (MLX ~100ms, Ollama 10-60s)

Recall

Q: Which integrations are available for this server?

Uses Ollama for generating semantic embeddings (via mxbai-embed-large model) to enable vector-based memory search and optional session summarization (via llama3.2 model). Stores memory metadata, relationships, and structured data in SQLite for persistent memory management and retrieval.

Q: How do I use Recall?

1. Click on "Install Server". 2. Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state. 3. In the chat, type @ followed by the MCP server name and your instructions, e.g., "@Recall search for my preferences about user interface design" That's it! The server will respond to your query, and you can continue using it as needed. Here is a step-by-step guide with screenshots.

Long-term memory system for MCP-compatible AI assistants with semantic search and relationship tracking.

Features

Persistent Memory Storage: Store preferences, decisions, patterns, and session context
Semantic Search: Find relevant memories using natural language queries via ChromaDB vectors
MLX Hybrid Embeddings: Native Apple Silicon support via MLX for ~5-10x faster embeddings (automatic fallback to Ollama)
Memory Relationships: Create edges between memories (supersedes, relates_to, caused_by, contradicts)
Namespace Isolation: Global memories vs project-scoped memories
Context Generation: Auto-format memories for session context injection
Deduplication: Content-hash based duplicate detection

Installation

# Clone the repository git clone https://github.com/yourorg/recall.git cd recall # Install with uv uv sync # On Apple Silicon: MLX embeddings work automatically (fastest option) # On other platforms or as fallback: ensure Ollama is running ollama pull mxbai-embed-large # Required if not using MLX ollama pull llama3.2 # Optional: session summarization for auto-capture hook ollama serve

Usage

Run as MCP Server

uv run python -m recall

CLI Options

uv run python -m recall --help Options: --sqlite-path PATH SQLite database path (default: ~/.recall/recall.db) --chroma-path PATH ChromaDB storage path (default: ~/.recall/chroma_db) --collection NAME ChromaDB collection name (default: memories) --ollama-host HOST Ollama server URL (default: http://localhost:11434) --ollama-model MODEL Embedding model (default: mxbai-embed-large) --ollama-timeout SECS Request timeout (default: 30) --log-level LEVEL DEBUG, INFO, WARNING, ERROR, CRITICAL (default: INFO)

meta-mcp Configuration

Add Recall to your meta-mcp servers.json:

{ "recall": { "command": "uv", "args": [ "run", "--directory", "/path/to/recall", "python", "-m", "recall" ], "env": { "RECALL_LOG_LEVEL": "INFO", "RECALL_OLLAMA_HOST": "http://localhost:11434", "RECALL_OLLAMA_MODEL": "mxbai-embed-large" }, "description": "Long-term memory system with semantic search", "tags": ["memory", "context", "semantic-search"] } }

Or for Claude Code / other MCP clients (claude.json):

{ "mcpServers": { "recall": { "command": "uv", "args": [ "run", "--directory", "/path/to/recall", "python", "-m", "recall" ], "env": { "RECALL_LOG_LEVEL": "INFO" } } } }

Environment Variables

Variable	Default	Description
`RECALL_SQLITE_PATH`	`~/.recall/recall.db`	SQLite database file path
`RECALL_CHROMA_PATH`	`~/.recall/chroma_db`	ChromaDB persistent storage directory
`RECALL_COLLECTION_NAME`	`memories`	ChromaDB collection name
`RECALL_EMBEDDING_BACKEND`	`ollama`	Embedding backend: `mlx` (Apple Silicon) or `ollama`
`RECALL_MLX_MODEL`	`mlx-community/mxbai-embed-large-v1`	MLX embedding model identifier
`RECALL_OLLAMA_HOST`	`http://localhost:11434`	Ollama server URL
`RECALL_OLLAMA_MODEL`	`mxbai-embed-large`	Ollama embedding model name
`RECALL_OLLAMA_TIMEOUT`	`30`	Ollama request timeout in seconds
`RECALL_LOG_LEVEL`	`INFO`	Logging level (DEBUG, INFO, WARNING, ERROR, CRITICAL)
`RECALL_DEFAULT_NAMESPACE`	`global`	Default namespace for memories
`RECALL_DEFAULT_IMPORTANCE`	`0.5`	Default importance score (0.0-1.0)
`RECALL_DEFAULT_TOKEN_BUDGET`	`4000`	Default token budget for context

MCP Tool Examples

memory_store_tool

Store a new memory with semantic indexing. Uses fast daemon path when available (<10ms), falls back to sync embedding otherwise.

{ "content": "User prefers dark mode in all applications", "memory_type": "preference", "namespace": "global", "importance": 0.8, "metadata": {"source": "explicit_request"} }

Response (fast path via daemon):

{ "success": true, "queued": true, "queue_id": 42, "namespace": "global" }

Response (sync path fallback):

{ "success": true, "queued": false, "id": "550e8400-e29b-41d4-a716-446655440000", "content_hash": "a1b2c3d4e5f67890" }

daemon_status_tool

Check if the recall daemon is running:

{}

Response:

{ "running": true, "status": { "pid": 12345, "store_queue": {"pending_count": 5}, "embed_worker_running": true } }

memory_recall_tool

Search memories by semantic similarity:

{ "query": "user interface preferences", "n_results": 5, "namespace": "global", "memory_type": "preference", "min_importance": 0.5, "include_related": true }

Response:

{ "success": true, "memories": [ { "id": "550e8400-e29b-41d4-a716-446655440000", "content": "User prefers dark mode in all applications", "type": "preference", "namespace": "global", "importance": 0.8, "created_at": "2024-01-15T10:30:00", "accessed_at": "2024-01-15T14:22:00", "access_count": 3 } ], "total": 1, "score": 0.92 }

memory_relate_tool

Create a relationship between memories:

{ "source_id": "mem_new_123", "target_id": "mem_old_456", "relation": "supersedes", "weight": 1.0 }

Response:

{ "success": true, "edge_id": 42 }

memory_context_tool

Generate formatted context for session injection:

{ "query": "coding style preferences", "project": "myproject", "token_budget": 4000 }

Response:

{ "success": true, "context": "# Memory Context\n\n## Preferences\n\n- User prefers dark mode [global]\n- Use 2-space indentation [project:myproject]\n\n## Recent Decisions\n\n- Decided to use FastAPI for the backend [project:myproject]\n", "token_estimate": 125 }

memory_forget_tool

Delete memories by ID or semantic search:

{ "memory_id": "550e8400-e29b-41d4-a716-446655440000", "confirm": true }

Or delete by search:

{ "query": "outdated preferences", "namespace": "project:oldproject", "n_results": 10, "confirm": true }

Response:

{ "success": true, "deleted_ids": ["550e8400-e29b-41d4-a716-446655440000"], "deleted_count": 1 }

Architecture

┌─────────────────────────────────────────────────────────────┐ │ MCP Server (FastMCP) │ │ memory_store │ memory_recall │ memory_relate │ memory_forget │ └───────────────────────────┬─────────────────────────────────┘ │ ┌─────────────┴─────────────┐ │ │ ┌─────────▼─────────┐ ┌─────────▼─────────┐ │ FAST PATH │ │ SYNC PATH │ │ <10ms │ │ MLX: <100ms │ └─────────┬─────────┘ │ Ollama: 10-60s │ │ └─────────┬─────────┘ ┌─────────▼─────────┐ │ │ recall-daemon │ ┌─────────▼─────────┐ │ (Unix socket) │ │ HybridStore │ │ │ └─────────┬─────────┘ │ ┌─────────────┐ │ │ │ │ StoreQueue │ │ ┌───────────┼───────────┐ │ │ EmbedWorker │ │ │ │ │ │ └─────────────┘ │ │ │ │ └─────────┬─────────┘ ┌─▼─────┐ ┌───▼───┐ ┌─────▼─────┐ │ │SQLite │ │Chroma │ │ Embedding │ └─────────────►Store │ │ Store │ │ Factory │ └───────┘ └───────┘ └─────┬─────┘ │ ┌───────────┴───────────┐ │ │ ┌─────▼─────┐ ┌─────▼─────┐ │ MLX │ │ Ollama │ │ (Apple) │ │ (Fallback)│ └───────────┘ └───────────┘

The daemon provides fast (<10ms) memory storage by queueing operations and processing embeddings asynchronously. When the daemon is unavailable, the MCP server falls back to synchronous embedding via MLX (~100ms on Apple Silicon) or Ollama (10-60s on other platforms).

Daemon Setup (macOS)

The recall daemon provides fast (<10ms) memory storage by processing embeddings asynchronously. Without the daemon, each store operation blocks for 10-60 seconds waiting for Ollama embeddings.

Quick Install

# From the recall directory ./hooks/install-daemon.sh

This will:

Copy hook scripts to ~/.claude/hooks/
Install the launchd plist to ~/Library/LaunchAgents/
Start the daemon automatically

Manual Install

# 1. Copy hook scripts cp hooks/recall*.py ~/.claude/hooks/ chmod +x ~/.claude/hooks/recall*.py # 2. Create logs directory mkdir -p ~/.claude/hooks/logs # 3. Install plist with path substitution sed "s|{{HOME}}|$HOME|g; s|{{RECALL_DIR}}|$(pwd)|g" \ hooks/com.recall.daemon.plist.template > ~/Library/LaunchAgents/com.recall.daemon.plist # 4. Load the daemon launchctl load ~/Library/LaunchAgents/com.recall.daemon.plist

Daemon Commands

# Check status echo '{"cmd": "status"}' | nc -U /tmp/recall-daemon.sock | jq # Stop daemon launchctl unload ~/Library/LaunchAgents/com.recall.daemon.plist # Start daemon launchctl load ~/Library/LaunchAgents/com.recall.daemon.plist # View logs tail -f ~/.claude/hooks/logs/recall-daemon.log

Hooks Configuration

Add recall hooks to your Claude Code settings (~/.claude/settings.json). See hooks/settings.example.json for the full configuration.

Development

# Install dev dependencies uv sync --dev # Run tests uv run pytest tests/ # Run tests with coverage uv run pytest tests/ --cov=recall --cov-report=html # Type checking uv run mypy src/recall # Run specific integration tests uv run pytest tests/integration/test_mcp_server.py -v

Requirements

Python 3.13+
For Apple Silicon (recommended): MLX embeddings work automatically with mlx-embeddings package
For other platforms: Ollama with:
- mxbai-embed-large model (required for semantic search)
- llama3.2 model (optional, for session auto-capture hook)
~500MB disk space for ChromaDB indices

License

MIT

Install Server

security – no known vulnerabilities

license - not found

quality - confirmed to work

How are these scores calculated?

Resources

GitHub Repository

Need Help?

Report Issue

Related Servers

Tools

View all tools

Recall

Recall

Features

Installation

Usage

Run as MCP Server

CLI Options

meta-mcp Configuration

Environment Variables

MCP Tool Examples

memory_store_tool

daemon_status_tool

memory_recall_tool

memory_relate_tool

memory_context_tool

memory_forget_tool

Architecture

Daemon Setup (macOS)

Quick Install

Manual Install

Daemon Commands

Hooks Configuration

Development

Requirements

License

Resources

Tools

Latest Blog Posts

MCP directory API