Skip to main content
Glama

Recall

Long-term memory system for MCP-compatible AI assistants with semantic search and relationship tracking.

Features

  • Persistent Memory Storage: Store preferences, decisions, patterns, and session context

  • Semantic Search: Find relevant memories using natural language queries via ChromaDB vectors

  • MLX Hybrid Embeddings: Native Apple Silicon support via MLX for ~5-10x faster embeddings (automatic fallback to Ollama)

  • Memory Relationships: Create edges between memories (supersedes, relates_to, caused_by, contradicts)

  • Namespace Isolation: Global memories vs project-scoped memories

  • Context Generation: Auto-format memories for session context injection

  • Deduplication: Content-hash based duplicate detection

Installation

# Clone the repository
git clone https://github.com/yourorg/recall.git
cd recall

# Install with uv
uv sync

# On Apple Silicon: MLX embeddings work automatically (fastest option)
# On other platforms or as fallback: ensure Ollama is running
ollama pull mxbai-embed-large  # Required if not using MLX
ollama pull llama3.2           # Optional: session summarization for auto-capture hook
ollama serve

Usage

Run as MCP Server

uv run python -m recall

CLI Options

uv run python -m recall --help

Options:
  --sqlite-path PATH      SQLite database path (default: ~/.recall/recall.db)
  --chroma-path PATH      ChromaDB storage path (default: ~/.recall/chroma_db)
  --collection NAME       ChromaDB collection name (default: memories)
  --ollama-host HOST      Ollama server URL (default: http://localhost:11434)
  --ollama-model MODEL    Embedding model (default: mxbai-embed-large)
  --ollama-timeout SECS   Request timeout (default: 30)
  --log-level LEVEL       DEBUG, INFO, WARNING, ERROR, CRITICAL (default: INFO)

meta-mcp Configuration

Add Recall to your meta-mcp servers.json:

{
  "recall": {
    "command": "uv",
    "args": [
      "run",
      "--directory",
      "/path/to/recall",
      "python",
      "-m",
      "recall"
    ],
    "env": {
      "RECALL_LOG_LEVEL": "INFO",
      "RECALL_OLLAMA_HOST": "http://localhost:11434",
      "RECALL_OLLAMA_MODEL": "mxbai-embed-large"
    },
    "description": "Long-term memory system with semantic search",
    "tags": ["memory", "context", "semantic-search"]
  }
}

Or for Claude Code / other MCP clients (claude.json):

{
  "mcpServers": {
    "recall": {
      "command": "uv",
      "args": [
        "run",
        "--directory",
        "/path/to/recall",
        "python",
        "-m",
        "recall"
      ],
      "env": {
        "RECALL_LOG_LEVEL": "INFO"
      }
    }
  }
}

Environment Variables

Variable

Default

Description

RECALL_SQLITE_PATH

~/.recall/recall.db

SQLite database file path

RECALL_CHROMA_PATH

~/.recall/chroma_db

ChromaDB persistent storage directory

RECALL_COLLECTION_NAME

memories

ChromaDB collection name

RECALL_EMBEDDING_BACKEND

ollama

Embedding backend: mlx (Apple Silicon) or ollama

RECALL_MLX_MODEL

mlx-community/mxbai-embed-large-v1

MLX embedding model identifier

RECALL_OLLAMA_HOST

http://localhost:11434

Ollama server URL

RECALL_OLLAMA_MODEL

mxbai-embed-large

Ollama embedding model name

RECALL_OLLAMA_TIMEOUT

30

Ollama request timeout in seconds

RECALL_LOG_LEVEL

INFO

Logging level (DEBUG, INFO, WARNING, ERROR, CRITICAL)

RECALL_DEFAULT_NAMESPACE

global

Default namespace for memories

RECALL_DEFAULT_IMPORTANCE

0.5

Default importance score (0.0-1.0)

RECALL_DEFAULT_TOKEN_BUDGET

4000

Default token budget for context

MCP Tool Examples

memory_store_tool

Store a new memory with semantic indexing. Uses fast daemon path when available (<10ms), falls back to sync embedding otherwise.

{
  "content": "User prefers dark mode in all applications",
  "memory_type": "preference",
  "namespace": "global",
  "importance": 0.8,
  "metadata": {"source": "explicit_request"}
}

Response (fast path via daemon):

{
  "success": true,
  "queued": true,
  "queue_id": 42,
  "namespace": "global"
}

Response (sync path fallback):

{
  "success": true,
  "queued": false,
  "id": "550e8400-e29b-41d4-a716-446655440000",
  "content_hash": "a1b2c3d4e5f67890"
}

daemon_status_tool

Check if the recall daemon is running:

{}

Response:

{
  "running": true,
  "status": {
    "pid": 12345,
    "store_queue": {"pending_count": 5},
    "embed_worker_running": true
  }
}

memory_recall_tool

Search memories by semantic similarity:

{
  "query": "user interface preferences",
  "n_results": 5,
  "namespace": "global",
  "memory_type": "preference",
  "min_importance": 0.5,
  "include_related": true
}

Response:

{
  "success": true,
  "memories": [
    {
      "id": "550e8400-e29b-41d4-a716-446655440000",
      "content": "User prefers dark mode in all applications",
      "type": "preference",
      "namespace": "global",
      "importance": 0.8,
      "created_at": "2024-01-15T10:30:00",
      "accessed_at": "2024-01-15T14:22:00",
      "access_count": 3
    }
  ],
  "total": 1,
  "score": 0.92
}

memory_relate_tool

Create a relationship between memories:

{
  "source_id": "mem_new_123",
  "target_id": "mem_old_456",
  "relation": "supersedes",
  "weight": 1.0
}

Response:

{
  "success": true,
  "edge_id": 42
}

memory_context_tool

Generate formatted context for session injection:

{
  "query": "coding style preferences",
  "project": "myproject",
  "token_budget": 4000
}

Response:

{
  "success": true,
  "context": "# Memory Context\n\n## Preferences\n\n- User prefers dark mode [global]\n- Use 2-space indentation [project:myproject]\n\n## Recent Decisions\n\n- Decided to use FastAPI for the backend [project:myproject]\n",
  "token_estimate": 125
}

memory_forget_tool

Delete memories by ID or semantic search:

{
  "memory_id": "550e8400-e29b-41d4-a716-446655440000",
  "confirm": true
}

Or delete by search:

{
  "query": "outdated preferences",
  "namespace": "project:oldproject",
  "n_results": 10,
  "confirm": true
}

Response:

{
  "success": true,
  "deleted_ids": ["550e8400-e29b-41d4-a716-446655440000"],
  "deleted_count": 1
}

Architecture

┌─────────────────────────────────────────────────────────────┐
│                     MCP Server (FastMCP)                     │
│  memory_store │ memory_recall │ memory_relate │ memory_forget │
└───────────────────────────┬─────────────────────────────────┘
                            │
              ┌─────────────┴─────────────┐
              │                           │
    ┌─────────▼─────────┐       ┌─────────▼─────────┐
    │   FAST PATH       │       │   SYNC PATH       │
    │   <10ms           │       │   MLX: <100ms     │
    └─────────┬─────────┘       │   Ollama: 10-60s  │
              │                 └─────────┬─────────┘
    ┌─────────▼─────────┐                 │
    │  recall-daemon    │       ┌─────────▼─────────┐
    │  (Unix socket)    │       │   HybridStore     │
    │                   │       └─────────┬─────────┘
    │  ┌─────────────┐  │                 │
    │  │ StoreQueue  │  │     ┌───────────┼───────────┐
    │  │ EmbedWorker │  │     │           │           │
    │  └─────────────┘  │     │           │           │
    └─────────┬─────────┘   ┌─▼─────┐ ┌───▼───┐ ┌─────▼─────┐
              │             │SQLite │ │Chroma │ │ Embedding │
              └─────────────►Store  │ │ Store │ │  Factory  │
                            └───────┘ └───────┘ └─────┬─────┘
                                                      │
                                          ┌───────────┴───────────┐
                                          │                       │
                                    ┌─────▼─────┐           ┌─────▼─────┐
                                    │    MLX    │           │  Ollama   │
                                    │  (Apple)  │           │ (Fallback)│
                                    └───────────┘           └───────────┘

The daemon provides fast (<10ms) memory storage by queueing operations and processing embeddings asynchronously. When the daemon is unavailable, the MCP server falls back to synchronous embedding via MLX (~100ms on Apple Silicon) or Ollama (10-60s on other platforms).

Daemon Setup (macOS)

The recall daemon provides fast (<10ms) memory storage by processing embeddings asynchronously. Without the daemon, each store operation blocks for 10-60 seconds waiting for Ollama embeddings.

Quick Install

# From the recall directory
./hooks/install-daemon.sh

This will:

  1. Copy hook scripts to ~/.claude/hooks/

  2. Install the launchd plist to ~/Library/LaunchAgents/

  3. Start the daemon automatically

Manual Install

# 1. Copy hook scripts
cp hooks/recall*.py ~/.claude/hooks/
chmod +x ~/.claude/hooks/recall*.py

# 2. Create logs directory
mkdir -p ~/.claude/hooks/logs

# 3. Install plist with path substitution
sed "s|{{HOME}}|$HOME|g; s|{{RECALL_DIR}}|$(pwd)|g" \
  hooks/com.recall.daemon.plist.template > ~/Library/LaunchAgents/com.recall.daemon.plist

# 4. Load the daemon
launchctl load ~/Library/LaunchAgents/com.recall.daemon.plist

Daemon Commands

# Check status
echo '{"cmd": "status"}' | nc -U /tmp/recall-daemon.sock | jq

# Stop daemon
launchctl unload ~/Library/LaunchAgents/com.recall.daemon.plist

# Start daemon
launchctl load ~/Library/LaunchAgents/com.recall.daemon.plist

# View logs
tail -f ~/.claude/hooks/logs/recall-daemon.log

Hooks Configuration

Add recall hooks to your Claude Code settings (~/.claude/settings.json). See hooks/settings.example.json for the full configuration.

Development

# Install dev dependencies
uv sync --dev

# Run tests
uv run pytest tests/

# Run tests with coverage
uv run pytest tests/ --cov=recall --cov-report=html

# Type checking
uv run mypy src/recall

# Run specific integration tests
uv run pytest tests/integration/test_mcp_server.py -v

Requirements

  • Python 3.13+

  • For Apple Silicon (recommended): MLX embeddings work automatically with mlx-embeddings package

  • For other platforms: Ollama with:

    • mxbai-embed-large model (required for semantic search)

    • llama3.2 model (optional, for session auto-capture hook)

  • ~500MB disk space for ChromaDB indices

License

MIT

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/blueman82/recall'

If you have feedback or need assistance with the MCP directory API, please join our Discord server