Skip to main content
Glama

Recall

Long-term memory system for MCP-compatible AI assistants with semantic search and relationship tracking.

Features

  • Persistent Memory Storage: Store preferences, decisions, patterns, and session context

  • Semantic Search: Find relevant memories using natural language queries via ChromaDB vectors

  • MLX Hybrid Embeddings: Native Apple Silicon support via MLX for ~5-10x faster embeddings (automatic fallback to Ollama)

  • Memory Relationships: Create edges between memories (supersedes, relates_to, caused_by, contradicts)

  • Namespace Isolation: Global memories vs project-scoped memories

  • Context Generation: Auto-format memories for session context injection

  • Deduplication: Content-hash based duplicate detection

Installation

# Clone the repository git clone https://github.com/yourorg/recall.git cd recall # Install with uv uv sync # On Apple Silicon: MLX embeddings work automatically (fastest option) # On other platforms or as fallback: ensure Ollama is running ollama pull mxbai-embed-large # Required if not using MLX ollama pull llama3.2 # Optional: session summarization for auto-capture hook ollama serve

Usage

Run as MCP Server

uv run python -m recall

CLI Options

uv run python -m recall --help Options: --sqlite-path PATH SQLite database path (default: ~/.recall/recall.db) --chroma-path PATH ChromaDB storage path (default: ~/.recall/chroma_db) --collection NAME ChromaDB collection name (default: memories) --ollama-host HOST Ollama server URL (default: http://localhost:11434) --ollama-model MODEL Embedding model (default: mxbai-embed-large) --ollama-timeout SECS Request timeout (default: 30) --log-level LEVEL DEBUG, INFO, WARNING, ERROR, CRITICAL (default: INFO)

meta-mcp Configuration

Add Recall to your meta-mcp servers.json:

{ "recall": { "command": "uv", "args": [ "run", "--directory", "/path/to/recall", "python", "-m", "recall" ], "env": { "RECALL_LOG_LEVEL": "INFO", "RECALL_OLLAMA_HOST": "http://localhost:11434", "RECALL_OLLAMA_MODEL": "mxbai-embed-large" }, "description": "Long-term memory system with semantic search", "tags": ["memory", "context", "semantic-search"] } }

Or for Claude Code / other MCP clients (claude.json):

{ "mcpServers": { "recall": { "command": "uv", "args": [ "run", "--directory", "/path/to/recall", "python", "-m", "recall" ], "env": { "RECALL_LOG_LEVEL": "INFO" } } } }

Environment Variables

Variable

Default

Description

RECALL_SQLITE_PATH

~/.recall/recall.db

SQLite database file path

RECALL_CHROMA_PATH

~/.recall/chroma_db

ChromaDB persistent storage directory

RECALL_COLLECTION_NAME

memories

ChromaDB collection name

RECALL_EMBEDDING_BACKEND

ollama

Embedding backend: mlx (Apple Silicon) or ollama

RECALL_MLX_MODEL

mlx-community/mxbai-embed-large-v1

MLX embedding model identifier

RECALL_OLLAMA_HOST

http://localhost:11434

Ollama server URL

RECALL_OLLAMA_MODEL

mxbai-embed-large

Ollama embedding model name

RECALL_OLLAMA_TIMEOUT

30

Ollama request timeout in seconds

RECALL_LOG_LEVEL

INFO

Logging level (DEBUG, INFO, WARNING, ERROR, CRITICAL)

RECALL_DEFAULT_NAMESPACE

global

Default namespace for memories

RECALL_DEFAULT_IMPORTANCE

0.5

Default importance score (0.0-1.0)

RECALL_DEFAULT_TOKEN_BUDGET

4000

Default token budget for context

MCP Tool Examples

memory_store_tool

Store a new memory with semantic indexing. Uses fast daemon path when available (<10ms), falls back to sync embedding otherwise.

{ "content": "User prefers dark mode in all applications", "memory_type": "preference", "namespace": "global", "importance": 0.8, "metadata": {"source": "explicit_request"} }

Response (fast path via daemon):

{ "success": true, "queued": true, "queue_id": 42, "namespace": "global" }

Response (sync path fallback):

{ "success": true, "queued": false, "id": "550e8400-e29b-41d4-a716-446655440000", "content_hash": "a1b2c3d4e5f67890" }

daemon_status_tool

Check if the recall daemon is running:

{}

Response:

{ "running": true, "status": { "pid": 12345, "store_queue": {"pending_count": 5}, "embed_worker_running": true } }

memory_recall_tool

Search memories by semantic similarity:

{ "query": "user interface preferences", "n_results": 5, "namespace": "global", "memory_type": "preference", "min_importance": 0.5, "include_related": true }

Response:

{ "success": true, "memories": [ { "id": "550e8400-e29b-41d4-a716-446655440000", "content": "User prefers dark mode in all applications", "type": "preference", "namespace": "global", "importance": 0.8, "created_at": "2024-01-15T10:30:00", "accessed_at": "2024-01-15T14:22:00", "access_count": 3 } ], "total": 1, "score": 0.92 }

memory_relate_tool

Create a relationship between memories:

{ "source_id": "mem_new_123", "target_id": "mem_old_456", "relation": "supersedes", "weight": 1.0 }

Response:

{ "success": true, "edge_id": 42 }

memory_context_tool

Generate formatted context for session injection:

{ "query": "coding style preferences", "project": "myproject", "token_budget": 4000 }

Response:

{ "success": true, "context": "# Memory Context\n\n## Preferences\n\n- User prefers dark mode [global]\n- Use 2-space indentation [project:myproject]\n\n## Recent Decisions\n\n- Decided to use FastAPI for the backend [project:myproject]\n", "token_estimate": 125 }

memory_forget_tool

Delete memories by ID or semantic search:

{ "memory_id": "550e8400-e29b-41d4-a716-446655440000", "confirm": true }

Or delete by search:

{ "query": "outdated preferences", "namespace": "project:oldproject", "n_results": 10, "confirm": true }

Response:

{ "success": true, "deleted_ids": ["550e8400-e29b-41d4-a716-446655440000"], "deleted_count": 1 }

Architecture

┌─────────────────────────────────────────────────────────────┐ │ MCP Server (FastMCP) │ │ memory_store │ memory_recall │ memory_relate │ memory_forget │ └───────────────────────────┬─────────────────────────────────┘ │ ┌─────────────┴─────────────┐ │ │ ┌─────────▼─────────┐ ┌─────────▼─────────┐ │ FAST PATH │ │ SYNC PATH │ │ <10ms │ │ MLX: <100ms │ └─────────┬─────────┘ │ Ollama: 10-60s │ │ └─────────┬─────────┘ ┌─────────▼─────────┐ │ │ recall-daemon │ ┌─────────▼─────────┐ │ (Unix socket) │ │ HybridStore │ │ │ └─────────┬─────────┘ │ ┌─────────────┐ │ │ │ │ StoreQueue │ │ ┌───────────┼───────────┐ │ │ EmbedWorker │ │ │ │ │ │ └─────────────┘ │ │ │ │ └─────────┬─────────┘ ┌─▼─────┐ ┌───▼───┐ ┌─────▼─────┐ │ │SQLite │ │Chroma │ │ Embedding │ └─────────────►Store │ │ Store │ │ Factory │ └───────┘ └───────┘ └─────┬─────┘ │ ┌───────────┴───────────┐ │ │ ┌─────▼─────┐ ┌─────▼─────┐ │ MLX │ │ Ollama │ │ (Apple) │ │ (Fallback)│ └───────────┘ └───────────┘

The daemon provides fast (<10ms) memory storage by queueing operations and processing embeddings asynchronously. When the daemon is unavailable, the MCP server falls back to synchronous embedding via MLX (~100ms on Apple Silicon) or Ollama (10-60s on other platforms).

Daemon Setup (macOS)

The recall daemon provides fast (<10ms) memory storage by processing embeddings asynchronously. Without the daemon, each store operation blocks for 10-60 seconds waiting for Ollama embeddings.

Quick Install

# From the recall directory ./hooks/install-daemon.sh

This will:

  1. Copy hook scripts to ~/.claude/hooks/

  2. Install the launchd plist to ~/Library/LaunchAgents/

  3. Start the daemon automatically

Manual Install

# 1. Copy hook scripts cp hooks/recall*.py ~/.claude/hooks/ chmod +x ~/.claude/hooks/recall*.py # 2. Create logs directory mkdir -p ~/.claude/hooks/logs # 3. Install plist with path substitution sed "s|{{HOME}}|$HOME|g; s|{{RECALL_DIR}}|$(pwd)|g" \ hooks/com.recall.daemon.plist.template > ~/Library/LaunchAgents/com.recall.daemon.plist # 4. Load the daemon launchctl load ~/Library/LaunchAgents/com.recall.daemon.plist

Daemon Commands

# Check status echo '{"cmd": "status"}' | nc -U /tmp/recall-daemon.sock | jq # Stop daemon launchctl unload ~/Library/LaunchAgents/com.recall.daemon.plist # Start daemon launchctl load ~/Library/LaunchAgents/com.recall.daemon.plist # View logs tail -f ~/.claude/hooks/logs/recall-daemon.log

Hooks Configuration

Add recall hooks to your Claude Code settings (~/.claude/settings.json). See hooks/settings.example.json for the full configuration.

Development

# Install dev dependencies uv sync --dev # Run tests uv run pytest tests/ # Run tests with coverage uv run pytest tests/ --cov=recall --cov-report=html # Type checking uv run mypy src/recall # Run specific integration tests uv run pytest tests/integration/test_mcp_server.py -v

Requirements

  • Python 3.13+

  • For Apple Silicon (recommended): MLX embeddings work automatically with mlx-embeddings package

  • For other platforms: Ollama with:

    • mxbai-embed-large model (required for semantic search)

    • llama3.2 model (optional, for session auto-capture hook)

  • ~500MB disk space for ChromaDB indices

License

MIT

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/blueman82/recall'

If you have feedback or need assistance with the MCP directory API, please join our Discord server