What can you do with this server?

Recall is a long-term memory system for AI assistants that provides persistent storage, semantic search, and relationship tracking for memories. Core Memory Operations * Store memories with automatic semantic indexing, content-hash deduplication, and optional auto-linking to related memories * Search memories using natural language queries with semantic similarity, filters (namespace, type, importance), and optional multi-hop graph expansion * Delete memories by ID or semantic search, with protection for high-confidence "golden rule" memories * Count and list memories with filtering, sorting, and pagination for auditing and exploration * Generate context by fetching relevant memories formatted as markdown for session injection, respecting token budgets Memory Relationships & Graph * Create relationships between memories (relates_to, supersedes, caused_by, contradicts) * Inspect graph structure with BFS traversal, configurable depth/direction, and Mermaid diagram generation * Delete edges between memories by ID, memory connection, or specific pairs * Auto-infer relationships using embedding similarity with optional LLM refinement Validation & Quality * Validate memories by recording application success/failure to adjust confidence scores automatically * Detect contradictions between memories using semantic search and LLM reasoning * Check for superseding memories based on validation history to identify outdated information * Analyze memory health to detect contradictions, low-confidence, and stale memories * View validation history showing applied/succeeded/failed events and confidence score evolution Performance & Monitoring * Check daemon status to monitor the async embedding service for fast storage (<10ms) * Track file activity to record file access events (read, write, edit) and view recent activity statistics Key Features * Namespace isolation (global vs project-scoped) * Importance scoring (0.0-1.0) for memory prioritization * Confidence-based promotion to "golden rule" status (auto-promoted at 0.9) * Fast path via daemon (<10ms) or sync fallback (MLX ~100ms, Ollama 10-60s)

Which integrations are available for this server?

Uses Ollama for generating semantic embeddings (via mxbai-embed-large model) to enable vector-based memory search and optional session summarization (via llama3.2 model). Stores memory metadata, relationships, and structured data in SQLite for persistent memory management and retrieval.

1. Click on "Install Server". 2. Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state. 3. In the chat, type @ followed by the MCP server name and your instructions, e.g., "@Recall search for my preferences about user interface design" That's it! The server will respond to your query, and you can continue using it as needed. Here is a step-by-step guide with screenshots.

Recall

Q: Which integrations are available for this server?

Uses Ollama for generating semantic embeddings (via mxbai-embed-large model) to enable vector-based memory search and optional session summarization (via llama3.2 model). Stores memory metadata, relationships, and structured data in SQLite for persistent memory management and retrieval.

Q: How do I use Recall?

1. Click on "Install Server". 2. Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state. 3. In the chat, type @ followed by the MCP server name and your instructions, e.g., "@Recall search for my preferences about user interface design" That's it! The server will respond to your query, and you can continue using it as needed. Here is a step-by-step guide with screenshots.

by blueman82

Overview Schema Related Servers Score Discussions

Python

Local

Recall is a long-term memory system for AI assistants that provides persistent storage, semantic search, and relationship tracking for memories.

Core Memory Operations

Store memories with automatic semantic indexing, content-hash deduplication, and optional auto-linking to related memories
Search memories using natural language queries with semantic similarity, filters (namespace, type, importance), and optional multi-hop graph expansion
Delete memories by ID or semantic search, with protection for high-confidence "golden rule" memories
Count and list memories with filtering, sorting, and pagination for auditing and exploration
Generate context by fetching relevant memories formatted as markdown for session injection, respecting token budgets

Memory Relationships & Graph

Create relationships between memories (relates_to, supersedes, caused_by, contradicts)
Inspect graph structure with BFS traversal, configurable depth/direction, and Mermaid diagram generation
Delete edges between memories by ID, memory connection, or specific pairs
Auto-infer relationships using embedding similarity with optional LLM refinement

Validation & Quality

Validate memories by recording application success/failure to adjust confidence scores automatically
Detect contradictions between memories using semantic search and LLM reasoning
Check for superseding memories based on validation history to identify outdated information
Analyze memory health to detect contradictions, low-confidence, and stale memories
View validation history showing applied/succeeded/failed events and confidence score evolution

Performance & Monitoring

Check daemon status to monitor the async embedding service for fast storage (<10ms)
Track file activity to record file access events (read, write, edit) and view recent activity statistics

Key Features

Namespace isolation (global vs project-scoped)
Importance scoring (0.0-1.0) for memory prioritization
Confidence-based promotion to "golden rule" status (auto-promoted at 0.9)
Fast path via daemon (<10ms) or sync fallback (MLX ~100ms, Ollama 10-60s)

Recall

Long-term memory system for MCP-compatible AI assistants with semantic search and relationship tracking.

Features

Persistent Memory Storage: Store preferences, decisions, patterns, and session context
Semantic Search: Find relevant memories using natural language queries via ChromaDB vectors
MLX Hybrid Embeddings: Native Apple Silicon support via MLX for ~5-10x faster embeddings (automatic fallback to Ollama)
Memory Relationships: Create edges between memories (supersedes, relates_to, caused_by, contradicts)
Namespace Isolation: Global memories vs project-scoped memories
Context Generation: Auto-format memories for session context injection
Deduplication: Content-hash based duplicate detection

Installation

# Clone the repository
git clone https://github.com/yourorg/recall.git
cd recall

# Install with uv
uv sync

# On Apple Silicon: MLX embeddings work automatically (fastest option)
# On other platforms or as fallback: ensure Ollama is running
ollama pull mxbai-embed-large  # Required if not using MLX
ollama pull llama3.2           # Optional: session summarization for auto-capture hook
ollama serve

Usage

Run as MCP Server

uv run python -m recall

CLI Options

uv run python -m recall --help

Options:
  --sqlite-path PATH      SQLite database path (default: ~/.recall/recall.db)
  --chroma-path PATH      ChromaDB storage path (default: ~/.recall/chroma_db)
  --collection NAME       ChromaDB collection name (default: memories)
  --ollama-host HOST      Ollama server URL (default: http://localhost:11434)
  --ollama-model MODEL    Embedding model (default: mxbai-embed-large)
  --ollama-timeout SECS   Request timeout (default: 30)
  --log-level LEVEL       DEBUG, INFO, WARNING, ERROR, CRITICAL (default: INFO)

meta-mcp Configuration

Add Recall to your meta-mcp servers.json:

{
  "recall": {
    "command": "uv",
    "args": [
      "run",
      "--directory",
      "/path/to/recall",
      "python",
      "-m",
      "recall"
    ],
    "env": {
      "RECALL_LOG_LEVEL": "INFO",
      "RECALL_OLLAMA_HOST": "http://localhost:11434",
      "RECALL_OLLAMA_MODEL": "mxbai-embed-large"
    },
    "description": "Long-term memory system with semantic search",
    "tags": ["memory", "context", "semantic-search"]
  }
}

Or for Claude Code / other MCP clients (claude.json):

{
  "mcpServers": {
    "recall": {
      "command": "uv",
      "args": [
        "run",
        "--directory",
        "/path/to/recall",
        "python",
        "-m",
        "recall"
      ],
      "env": {
        "RECALL_LOG_LEVEL": "INFO"
      }
    }
  }
}

Environment Variables

Variable	Default	Description
`RECALL_SQLITE_PATH`	`~/.recall/recall.db`	SQLite database file path
`RECALL_CHROMA_PATH`	`~/.recall/chroma_db`	ChromaDB persistent storage directory
`RECALL_COLLECTION_NAME`	`memories`	ChromaDB collection name
`RECALL_EMBEDDING_BACKEND`	`ollama`	Embedding backend: `mlx` (Apple Silicon) or `ollama`
`RECALL_MLX_MODEL`	`mlx-community/mxbai-embed-large-v1`	MLX embedding model identifier
`RECALL_OLLAMA_HOST`	`http://localhost:11434`	Ollama server URL
`RECALL_OLLAMA_MODEL`	`mxbai-embed-large`	Ollama embedding model name
`RECALL_OLLAMA_TIMEOUT`	`30`	Ollama request timeout in seconds
`RECALL_LOG_LEVEL`	`INFO`	Logging level (DEBUG, INFO, WARNING, ERROR, CRITICAL)
`RECALL_DEFAULT_NAMESPACE`	`global`	Default namespace for memories
`RECALL_DEFAULT_IMPORTANCE`	`0.5`	Default importance score (0.0-1.0)
`RECALL_DEFAULT_TOKEN_BUDGET`	`4000`	Default token budget for context

MCP Tool Examples

memory_store_tool

Store a new memory with semantic indexing. Uses fast daemon path when available (<10ms), falls back to sync embedding otherwise.

{
  "content": "User prefers dark mode in all applications",
  "memory_type": "preference",
  "namespace": "global",
  "importance": 0.8,
  "metadata": {"source": "explicit_request"}
}

Response (fast path via daemon):

{
  "success": true,
  "queued": true,
  "queue_id": 42,
  "namespace": "global"
}

Response (sync path fallback):

{
  "success": true,
  "queued": false,
  "id": "550e8400-e29b-41d4-a716-446655440000",
  "content_hash": "a1b2c3d4e5f67890"
}

daemon_status_tool

Check if the recall daemon is running:

{}

Response:

{
  "running": true,
  "status": {
    "pid": 12345,
    "store_queue": {"pending_count": 5},
    "embed_worker_running": true
  }
}

memory_recall_tool

Search memories by semantic similarity:

{
  "query": "user interface preferences",
  "n_results": 5,
  "namespace": "global",
  "memory_type": "preference",
  "min_importance": 0.5,
  "include_related": true
}

Response:

{
  "success": true,
  "memories": [
    {
      "id": "550e8400-e29b-41d4-a716-446655440000",
      "content": "User prefers dark mode in all applications",
      "type": "preference",
      "namespace": "global",
      "importance": 0.8,
      "created_at": "2024-01-15T10:30:00",
      "accessed_at": "2024-01-15T14:22:00",
      "access_count": 3
    }
  ],
  "total": 1,
  "score": 0.92
}

memory_relate_tool

Create a relationship between memories:

{
  "source_id": "mem_new_123",
  "target_id": "mem_old_456",
  "relation": "supersedes",
  "weight": 1.0
}

Response:

{
  "success": true,
  "edge_id": 42
}

memory_context_tool

Generate formatted context for session injection:

{
  "query": "coding style preferences",
  "project": "myproject",
  "token_budget": 4000
}

Response:

{
  "success": true,
  "context": "# Memory Context\n\n## Preferences\n\n- User prefers dark mode [global]\n- Use 2-space indentation [project:myproject]\n\n## Recent Decisions\n\n- Decided to use FastAPI for the backend [project:myproject]\n",
  "token_estimate": 125
}

memory_forget_tool

Delete memories by ID or semantic search:

{
  "memory_id": "550e8400-e29b-41d4-a716-446655440000",
  "confirm": true
}

Or delete by search:

{
  "query": "outdated preferences",
  "namespace": "project:oldproject",
  "n_results": 10,
  "confirm": true
}

Response:

{
  "success": true,
  "deleted_ids": ["550e8400-e29b-41d4-a716-446655440000"],
  "deleted_count": 1
}

Architecture

┌─────────────────────────────────────────────────────────────┐
│                     MCP Server (FastMCP)                     │
│  memory_store │ memory_recall │ memory_relate │ memory_forget │
└───────────────────────────┬─────────────────────────────────┘
                            │
              ┌─────────────┴─────────────┐
              │                           │
    ┌─────────▼─────────┐       ┌─────────▼─────────┐
    │   FAST PATH       │       │   SYNC PATH       │
    │   <10ms           │       │   MLX: <100ms     │
    └─────────┬─────────┘       │   Ollama: 10-60s  │
              │                 └─────────┬─────────┘
    ┌─────────▼─────────┐                 │
    │  recall-daemon    │       ┌─────────▼─────────┐
    │  (Unix socket)    │       │   HybridStore     │
    │                   │       └─────────┬─────────┘
    │  ┌─────────────┐  │                 │
    │  │ StoreQueue  │  │     ┌───────────┼───────────┐
    │  │ EmbedWorker │  │     │           │           │
    │  └─────────────┘  │     │           │           │
    └─────────┬─────────┘   ┌─▼─────┐ ┌───▼───┐ ┌─────▼─────┐
              │             │SQLite │ │Chroma │ │ Embedding │
              └─────────────►Store  │ │ Store │ │  Factory  │
                            └───────┘ └───────┘ └─────┬─────┘
                                                      │
                                          ┌───────────┴───────────┐
                                          │                       │
                                    ┌─────▼─────┐           ┌─────▼─────┐
                                    │    MLX    │           │  Ollama   │
                                    │  (Apple)  │           │ (Fallback)│
                                    └───────────┘           └───────────┘

The daemon provides fast (<10ms) memory storage by queueing operations and processing embeddings asynchronously. When the daemon is unavailable, the MCP server falls back to synchronous embedding via MLX (~100ms on Apple Silicon) or Ollama (10-60s on other platforms).

Daemon Setup (macOS)

The recall daemon provides fast (<10ms) memory storage by processing embeddings asynchronously. Without the daemon, each store operation blocks for 10-60 seconds waiting for Ollama embeddings.

Quick Install

# From the recall directory
./hooks/install-daemon.sh

This will:

Copy hook scripts to ~/.claude/hooks/
Install the launchd plist to ~/Library/LaunchAgents/
Start the daemon automatically

Manual Install

# 1. Copy hook scripts
cp hooks/recall*.py ~/.claude/hooks/
chmod +x ~/.claude/hooks/recall*.py

# 2. Create logs directory
mkdir -p ~/.claude/hooks/logs

# 3. Install plist with path substitution
sed "s|{{HOME}}|$HOME|g; s|{{RECALL_DIR}}|$(pwd)|g" \
  hooks/com.recall.daemon.plist.template > ~/Library/LaunchAgents/com.recall.daemon.plist

# 4. Load the daemon
launchctl load ~/Library/LaunchAgents/com.recall.daemon.plist

Daemon Commands

# Check status
echo '{"cmd": "status"}' | nc -U /tmp/recall-daemon.sock | jq

# Stop daemon
launchctl unload ~/Library/LaunchAgents/com.recall.daemon.plist

# Start daemon
launchctl load ~/Library/LaunchAgents/com.recall.daemon.plist

# View logs
tail -f ~/.claude/hooks/logs/recall-daemon.log

Hooks Configuration

Add recall hooks to your Claude Code settings (~/.claude/settings.json). See hooks/settings.example.json for the full configuration.

Development

# Install dev dependencies
uv sync --dev

# Run tests
uv run pytest tests/

# Run tests with coverage
uv run pytest tests/ --cov=recall --cov-report=html

# Type checking
uv run mypy src/recall

# Run specific integration tests
uv run pytest tests/integration/test_mcp_server.py -v

Requirements

Python 3.13+
For Apple Silicon (recommended): MLX embeddings work automatically with mlx-embeddings package
For other platforms: Ollama with:
- mxbai-embed-large model (required for semantic search)
- llama3.2 model (optional, for session auto-capture hook)
~500MB disk space for ChromaDB indices

License

MIT

Install Server

license - not found

quality

maintenance - not tested

How are these scores calculated?

Resources

GitHub Repository

Need Help?

Related Servers

Tools

View all tools

Latest Blog Posts

Your AI Chatbot Just Exposed Your CEO's Salary to an Intern
By Om-Shree-0709 on July 2, 2026.
Agent Identity
MCP Security
OAuth Delegation
Why MCP Servers Need Execution Sandboxing (And Why Your Current Stack Isn't Enough)
By Om-Shree-0709 on June 30, 2026.
Agentic Ai
Prompt Injection
WebAssembly
Lightport: Open-Sourcing Glama's AI Gateway
By punkpeye on April 27, 2026.
OpenAI
open source

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/blueman82/recall'

If you have feedback or need assistance with the MCP directory API, please join our Discord server