How do I use mcp-recall?

1. Click on "Install Server". 2. Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state. 3. In the chat, type @ followed by the MCP server name and your instructions, e.g., "@mcp-recall search memories for 'database backup'" That's it! The server will respond to your query, and you can continue using it as needed. Here is a step-by-step guide with screenshots.

mcp-recall

by Jensimogit

Overview Schema Related Servers Score Discussions

JavaScript

Remote

mcp-recall

A self-hosted MCP memory server that gives AI assistants persistent, semantic memory. Store facts, search by meaning, swap embedding models — all running locally on your own hardware.

No cloud APIs. No GPU required. No API costs.

You: "What do we know about the backup configuration?"
Claude: memory_search → finds relevant memories in milliseconds

What it does

mcp-recall stores memories as vector embeddings in PostgreSQL and makes them searchable via the Model Context Protocol. Your AI assistant can remember things across sessions, projects, and devices.

Feature	Details
Semantic search	Find memories by meaning, not keywords
Swappable models	Change embedding models without losing data
Built-in benchmarking	Compare models against your actual data
Dual transport	Streamable HTTP + SSE (legacy)
Local embeddings	ONNX models run in-process, no external calls
Low resource	Runs on a Celeron J1900 with 16 GB RAM

Related MCP server: Local Brain MCP

Quick start

1. Clone and configure

git clone https://github.com/Jensimogit/mcp-recall.git
cd mcp-recall
npm install
cp .env.example .env
# Generate a random database password (you'll never need to type it)
echo "POSTGRES_PASSWORD=$(openssl rand -base64 32)" >> .env

2. Download an embedding model

Models are not included in the repository (they're 170–560 MB). Download one:

node scripts/download-model.js multilingual-e5-large

Verify the model files are in place:

ls models/multilingual-e5-large/
# Expected: config.json  onnx/  tokenizer.json  tokenizer_config.json

If the directory is empty (rare, depends on cache layout), copy manually:

find node_modules/@xenova/transformers/.cache -name "config.json"
# Copy the directory that contains config.json + tokenizer.json:
cp -r node_modules/@xenova/transformers/.cache/Xenova/multilingual-e5-large/* models/multilingual-e5-large/

3. Start the server

docker compose up -d

That's it. The server starts, runs the database migration, loads the embedding model, and listens on port 3000.

# Verify it's running
curl http://localhost:3000/health
# {"status":"ok","version":"0.2.0","model":"multilingual-e5-large","memories":0,"sessions":0}

4. Seed example memories (optional)

Load some example memories to verify search works and to run benchmarks:

docker compose run --rm mcp-recall node scripts/seed-examples.js

This stores 10 memories about mcp-recall itself. You can search them immediately:

# Quick test via the health endpoint — should show memories: 10
curl http://localhost:3000/health

5. Connect your AI assistant

Claude Code:

claude mcp add -s user --transport http mcp-recall http://localhost:3000/mcp

Claude Code (SSE transport):

claude mcp add -s user --transport sse mcp-recall http://localhost:3000/sse

Other MCP clients — point them to http://localhost:3000/mcp (Streamable HTTP) or http://localhost:3000/sse (SSE).

6. Verify it works

Start a new Claude Code session and try the tools:

$ claude

❯ Use memory_stats to check the database

● mcp-recall - memory_stats (MCP)
  ⎿  Total memories: 10
     Unique tags: 14

● The database has 10 memories with 14 unique tags.

❯ Search memories for "backup"

● mcp-recall - memory_search (MCP)(query: "backup")
  ⎿  [1] (79.1% match) Database backup: docker exec mcp-recall-db pg_dump -U mcp
     mcp_recall > backup.sql. Restore: cat backup.sql | docker exec -i
     mcp-recall-db psql -U mcp mcp_recall.
         Tags: operations, backup

❯ Store a new memory: "The deploy key is in 1Password under 'production-deploy'"

● mcp-recall - memory_store (MCP)(content: "The deploy key is in 1Password under
                                 'production-deploy'", tags: ["deployment","credentials"])
  ⎿  Stored memory 33c6f4e8-f0bc-435a-bb46-fd83676698dd:
     The deploy key is in 1Password under 'production-deploy'

❯ Search for "deploy credentials"

● mcp-recall - memory_search (MCP)(query: "deploy credentials")
  ⎿  [1] (85.0% match) The deploy key is in 1Password under 'production-deploy'
         Tags: deployment, credentials

Note how "deploy credentials" matches "deploy key in 1Password" with 85% similarity — that's semantic search in action, matching meaning rather than keywords.

Architecture

┌─────────────────────────────────────────────┐
│  MCP Client (Claude Code, claude.ai, etc.)  │
└─────────────────┬───────────────────────────┘
                  │  HTTP (Streamable HTTP or SSE)
                  ▼
┌─────────────────────────────────────────────┐
│  mcp-recall-server (Node.js 22)             │
│  ├── MCP Protocol (6 tools)                 │
│  ├── Express HTTP                           │
│  └── @xenova/transformers (ONNX, local)     │
│       └── Embedding model (volume mount)    │
└─────────────────┬───────────────────────────┘
                  │
┌─────────────────▼───────────────────────────┐
│  PostgreSQL 16 + pgvector                   │
│  └── HNSW index (cosine similarity)         │
└─────────────────────────────────────────────┘

All components run in Docker. The embedding model runs directly in the Node.js process using ONNX Runtime — no Ollama, no Python, no separate inference server.

MCP Tools

Your AI assistant gets these tools:

Tool	Description
`memory_store`	Store a new memory (auto-generates embedding)
`memory_search`	Search by semantic similarity
`memory_update`	Update content, tags, or metadata (re-embeds if content changes)
`memory_delete`	Delete a memory by ID
`memory_list`	List memories, optionally filtered by tags
`memory_stats`	Show database statistics

Embedding models

Recommended: multilingual-e5-large (1024d)

This is the default and recommended model. It's trained specifically for information retrieval (short query → long text), which is exactly how memory search works.

Available models

Model	Dimensions	Size (quantized)	Best for
`multilingual-e5-large`	1024	~553 MB	General use (recommended)
`bge-m3`	1024	~560 MB	Multi-granular retrieval
`all-MiniLM-L6-v2`	384	~22 MB	Minimal resources, English-only

You can use any ONNX model compatible with @xenova/transformers. Just place it in models/<name>/ with the standard HuggingFace file structure.

Benchmark results

We tested three models against 201 real memories with 8 search queries:

Model	Correct top-1	Avg similarity	Speed
multilingual-e5-large	8/8 (100%)	85.0%	0.1/s*
bge-m3	8/8 (100%)	61.3%	0.1/s*
cross-en-de-roberta	2/8 (25%)	35.3%	0.5/s*

* Embedding speed on Intel Celeron J1900. Much faster on modern CPUs.

Key finding: Models trained for information retrieval (e5, bge) dramatically outperform sentence-similarity models (roberta) for memory search, regardless of language specialization.

Switching models

# Compare models against your data (read-only, no changes)
docker compose run --rm mcp-recall node scripts/benchmark-models.js multilingual-e5-large

# Switch to a different model (migrates DB, re-embeds everything)
docker compose run --rm mcp-recall node scripts/switch-model.js bge-m3

# Restart the server to use the new model
docker compose restart mcp-recall

The switch script handles everything:

Detects dimension changes and migrates the database
Re-embeds all memories with the new model
Updates the .env file
Verifies the result

Your text data is never lost. Only the vector embeddings are regenerated. Content, tags, and metadata remain untouched in PostgreSQL.

Configuration

Environment variables

Variable	Default	Description
`POSTGRES_PASSWORD`	(required)	Database password
`EMBEDDINGS_MODEL`	`multilingual-e5-large`	Model directory name in `./models/`
`MCP_PORT`	`3000`	Server port
`TRUST_PROXY`	`0`	Proxy trust level (set to `1` behind nginx/Caddy)
`MCP_API_KEY`	(none)	Static Bearer token for CLI clients (Claude Code)
`MCP_AUTH_PIN`	(none)	PIN for OAuth 2.1 consent flow (claude.ai, mobile)
`MCP_BASE_URL`	(none)	Public URL of the server (required for OAuth)

Authentication

mcp-recall supports two optional authentication methods. If neither is configured, all requests are allowed (suitable for local-only use).

API key (for Claude Code and other CLI clients):

# Generate a key and add to .env
echo "MCP_API_KEY=$(openssl rand -base64 32)" >> .env

# Configure Claude Code with the key
claude mcp add -s user --transport http \
  --header "Authorization: Bearer YOUR_API_KEY" \
  mcp-recall http://localhost:3000/mcp

OAuth 2.1 with PIN (for claude.ai, mobile clients):

# Add to .env
MCP_AUTH_PIN=123456          # choose a secure PIN
MCP_BASE_URL=https://your-server.example.com  # public URL

When a web client connects, it's redirected to a PIN entry page. After entering the correct PIN, the client receives an OAuth token (valid 24h, refresh 30 days). Failed PIN attempts are rate-limited with increasing delays.

Behind a reverse proxy

If you run mcp-recall behind a reverse proxy (nginx, Caddy, Traefik):

Set TRUST_PROXY=1 in .env
Proxy to http://localhost:3000
For Streamable HTTP: proxy POST/GET/DELETE /mcp
For SSE: proxy GET /sse and POST /messages

Resource usage

Measured on an Intel Celeron J1900 (4 cores @ 2.0 GHz) with 16 GB RAM:

Component	RAM	CPU (idle)	Disk
mcp-recall-server	~1.1 GB	0%	~50 MB (image)
PostgreSQL + pgvector	~26 MB	0%	~20 MB (200 memories)
Total	~1.1 GB	0%	-
Embedding model (on disk)	-	-	553 MB (e5-large)

Memory usage is dominated by the ONNX model loaded into RAM
CPU spikes only during embedding generation (~100–200ms per query)
Re-embedding 200 memories takes ~30 minutes on the Celeron, much less on modern hardware

Project structure

mcp-recall/
├── compose.yml             # Docker Compose (2 services)
├── Dockerfile              # Server image (node:22-slim)
├── .env.example            # Configuration template
├── package.json            # 5 dependencies
├── models/                 # Embedding models (git-ignored, volume-mounted)
│   └── multilingual-e5-large/
│       ├── config.json
│       ├── tokenizer.json
│       ├── tokenizer_config.json
│       └── onnx/model_quantized.onnx
├── migrations/
│   └── 001_init.sql        # Schema: memories table + HNSW index
├── scripts/
│   ├── switch-model.js     # Switch models with DB migration + re-embedding
│   ├── benchmark-models.js # A/B compare models against your data
│   ├── download-model.js   # Download models from Hugging Face
│   └── seed-examples.js    # Load example memories for testing
└── src/
    ├── index.js            # MCP server, Express, dual transport (308 lines)
    ├── database.js         # PostgreSQL CRUD operations (158 lines)
    ├── embeddings.js       # Model-agnostic embedding engine (42 lines)
    └── migrate.js          # Standalone migration runner (16 lines)

~970 lines of code total. No framework overhead, no unnecessary abstractions.

Database

The schema is simple — one table:

CREATE TABLE memories (
    id          UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    content     TEXT NOT NULL,
    metadata    JSONB DEFAULT '{}',
    tags        TEXT[] DEFAULT '{}',
    embedding   vector(1024) NOT NULL,
    created_at  TIMESTAMPTZ DEFAULT NOW(),
    updated_at  TIMESTAMPTZ DEFAULT NOW()
);

Indexes: HNSW (cosine similarity on embeddings), GIN (tag filtering), B-tree (created_at).

Backup

# Dump the database
docker exec mcp-recall-db pg_dump -U mcp mcp_recall > backup.sql

# Restore
cat backup.sql | docker exec -i mcp-recall-db psql -U mcp mcp_recall

FAQ

Q: Do I need a GPU? No. The embedding model runs on CPU via ONNX Runtime. It works fine on low-power hardware like a Celeron J1900. Embedding generation takes ~100–200ms per query — imperceptible during normal use.

Q: How many memories can it handle? The HNSW index works efficiently up to tens of thousands of entries. At that scale, consider IVFFlat indexing instead.

Q: Can I use it with ChatGPT / other LLMs? Yes — any MCP-compatible client works. The server implements the standard Model Context Protocol.

Q: What happens if I switch models? Your text data (content, tags, metadata) is preserved. Only the vector embeddings are regenerated. The switch-model.js script handles the entire process, including database dimension changes.

Q: Is my data sent anywhere? No. Embeddings are generated locally. The server has no outbound connections. Your data stays on your hardware. However, when an MCP client retrieves memories, the content flows to whatever LLM provider the client uses.

Dependencies and licenses

Package	License	Purpose
@modelcontextprotocol/sdk	MIT	MCP protocol implementation
@xenova/transformers	Apache-2.0	ONNX Runtime for embeddings
express	MIT	HTTP server
pg	MIT	PostgreSQL client
zod	MIT	Schema validation
pgvector	PostgreSQL License	Vector similarity search

All dependencies are permissively licensed (MIT or Apache-2.0).

Contributing

Contributions are welcome! This project values simplicity — please keep changes focused and minimal.

License

MIT

This server cannot be installed

license - permissive license

quality - not tested

maintenance

How are these scores calculated?

Maintenance

–Maintainers

–Response time

–Release cycle

–Releases (12mo)

Commit activity

Resources

Need Help?

Related Servers

Unclaimed servers have limited discoverability.

Looking for Admin?

If you are the server author, to access and configure the admin panel.

Latest Blog Posts

Your AI Chatbot Just Exposed Your CEO's Salary to an Intern
By Om-Shree-0709 on July 2, 2026.
Agent Identity
MCP Security
OAuth Delegation
Why MCP Servers Need Execution Sandboxing (And Why Your Current Stack Isn't Enough)
By Om-Shree-0709 on June 30, 2026.
Agentic Ai
Prompt Injection
WebAssembly
Lightport: Open-Sourcing Glama's AI Gateway
By punkpeye on April 27, 2026.
OpenAI
open source

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/Jensimogit/mcp-recall'

If you have feedback or need assistance with the MCP directory API, please join our Discord server