Mnemo MCP Server
mcp-name: io.github.n24q02m/mnemo-mcp
Persistent AI memory with hybrid search and embedded sync. Open, free, unlimited.
Features
Hybrid search: FTS5 full-text + sqlite-vec semantic + Qwen3-Embedding-0.6B (built-in)
Zero config mode: Works out of the box — local embedding, no API keys needed
Auto-detect embedding: Set
API_KEYSfor cloud embedding, auto-fallback to localEmbedded sync: rclone auto-downloaded and managed as subprocess
Multi-machine: JSONL-based merge sync via rclone (Google Drive, S3, etc.)
Proactive memory: Tool descriptions guide AI to save preferences, decisions, facts
Quick Start
The recommended way to run this server is via uvx:
uvx mnemo-mcp@latestAlternatively, you can use
pipx run mnemo-mcp.
Option 1: uvx (Recommended)
{
"mcpServers": {
"mnemo": {
"command": "uvx",
"args": ["mnemo-mcp@latest"],
"env": {
// -- optional: LiteLLM Proxy (production, selfhosted gateway)
// "LITELLM_PROXY_URL": "http://10.0.0.20:4000",
// "LITELLM_PROXY_KEY": "sk-your-virtual-key",
// -- optional: cloud embedding (Gemini > OpenAI > Cohere) for semantic search
// -- without this, uses built-in local Qwen3-Embedding-0.6B (ONNX, CPU)
// -- first run downloads ~570MB model, cached for subsequent runs
"API_KEYS": "GOOGLE_API_KEY:AIza...",
// -- optional: custom embedding endpoint (e.g. modalcom-ai-workers on Modal.com)
// "EMBEDDING_API_BASE": "https://your-worker.modal.run",
// "EMBEDDING_API_KEY": "your-key",
// -- optional: sync memories across machines via rclone
"SYNC_ENABLED": "true", // optional, default: false
"SYNC_REMOTE": "gdrive", // required when SYNC_ENABLED=true
"SYNC_INTERVAL": "300", // optional, auto-sync every 5min (0 = manual only)
"RCLONE_CONFIG_GDRIVE_TYPE": "drive", // required when SYNC_ENABLED=true
"RCLONE_CONFIG_GDRIVE_TOKEN": "<base64>" // required when SYNC_ENABLED=true, from: uvx mnemo-mcp setup-sync drive
}
}
}
}Option 2: Docker
{
"mcpServers": {
"mnemo": {
"command": "docker",
"args": [
"run", "-i", "--rm",
"--name", "mcp-mnemo",
"-v", "mnemo-data:/data", // persists memories across restarts
"-e", "LITELLM_PROXY_URL", // optional: pass-through from env below
"-e", "LITELLM_PROXY_KEY", // optional: pass-through from env below
"-e", "API_KEYS", // optional: pass-through from env below
"-e", "EMBEDDING_API_BASE", // optional: pass-through from env below
"-e", "EMBEDDING_API_KEY", // optional: pass-through from env below
"-e", "SYNC_ENABLED", // optional: pass-through from env below
"-e", "SYNC_REMOTE", // required when SYNC_ENABLED=true: pass-through
"-e", "SYNC_INTERVAL", // optional: pass-through from env below
"-e", "RCLONE_CONFIG_GDRIVE_TYPE", // required when SYNC_ENABLED=true: pass-through
"-e", "RCLONE_CONFIG_GDRIVE_TOKEN", // required when SYNC_ENABLED=true: pass-through
"n24q02m/mnemo-mcp:latest"
],
"env": {
// -- optional: LiteLLM Proxy (production, selfhosted gateway)
// "LITELLM_PROXY_URL": "http://10.0.0.20:4000",
// "LITELLM_PROXY_KEY": "sk-your-virtual-key",
// -- optional: cloud embedding (Gemini > OpenAI > Cohere) for semantic search
// -- without this, uses built-in local Qwen3-Embedding-0.6B (ONNX, CPU)
"API_KEYS": "GOOGLE_API_KEY:AIza...",
// -- optional: custom embedding endpoint (e.g. modalcom-ai-workers on Modal.com)
// "EMBEDDING_API_BASE": "https://your-worker.modal.run",
// "EMBEDDING_API_KEY": "your-key",
// -- optional: sync memories across machines via rclone
"SYNC_ENABLED": "true", // optional, default: false
"SYNC_REMOTE": "gdrive", // required when SYNC_ENABLED=true
"SYNC_INTERVAL": "300", // optional, auto-sync every 5min (0 = manual only)
"RCLONE_CONFIG_GDRIVE_TYPE": "drive", // required when SYNC_ENABLED=true
"RCLONE_CONFIG_GDRIVE_TOKEN": "<base64>" // required when SYNC_ENABLED=true, from: uvx mnemo-mcp setup-sync drive
}
}
}
}Pre-install (optional)
Pre-download dependencies before adding to your MCP client config. This avoids slow first-run startup:
# Pre-download embedding model (~570MB) and validate API keys
uvx mnemo-mcp warmup
# With cloud embedding (validates API key, skips local download if cloud works)
API_KEYS="GOOGLE_API_KEY:AIza..." uvx mnemo-mcp warmupSync setup (one-time)
# Google Drive
uvx mnemo-mcp setup-sync drive
# Other providers (any rclone remote type)
uvx mnemo-mcp setup-sync dropbox
uvx mnemo-mcp setup-sync onedrive
uvx mnemo-mcp setup-sync s3Opens a browser for OAuth and outputs env vars (RCLONE_CONFIG_*) to set. Both raw JSON and base64 tokens are supported.
Configuration
Variable | Default | Description |
|
| Database location |
| — | LiteLLM Proxy URL (e.g. |
| — | LiteLLM Proxy virtual key (e.g. |
| — | API keys ( |
| — | Custom embedding endpoint URL (optional, for SDK mode) |
| — | Custom embedding endpoint key (optional) |
| (auto-detect) |
|
| auto-detect | LiteLLM model name (optional) |
|
| Embedding dimensions (0 = auto-detect, default 768) |
|
| Enable rclone sync |
| — | rclone remote name (required when sync enabled) |
|
| Remote folder (optional) |
|
| Auto-sync seconds (optional, 0=manual) |
|
| Log level (optional) |
Embedding (3-Mode Architecture)
Embedding is always available — a local model is built-in and requires no configuration.
Embedding access supports 3 modes, resolved by priority:
Priority | Mode | Config | Use case |
1 | Proxy |
| Production (OCI VM, selfhosted gateway) |
2 | SDK |
| Dev/local with direct API access |
3 | Local | Nothing needed | Offline, always available as fallback |
No cross-mode fallback — if proxy is configured but unreachable, calls fail (no silent fallback to direct API).
Local mode: Qwen3-Embedding-0.6B, always available with zero config.
GPU auto-detection: If GPU is available (CUDA/DirectML) and
llama-cpp-pythonis installed, automatically uses GGUF model (~480MB) instead of ONNX (~570MB) for better performance.All embeddings stored at 768 dims (default). Switching providers never breaks the vector table.
Override with
EMBEDDING_BACKEND=localto force local even with API keys.
API_KEYS supports multiple providers in a single string:
API_KEYS=GOOGLE_API_KEY:AIza...,OPENAI_API_KEY:sk-...,COHERE_API_KEY:co-...Cloud embedding providers (auto-detected from API_KEYS, priority order):
Priority | Env Var (LiteLLM) | Model | Native Dims | Stored |
1 |
|
| 3072 | 768 |
2 |
|
| 3072 | 768 |
3 |
|
| 1024 | 768 |
All embeddings are truncated to 768 dims (default) for storage. This ensures switching models never breaks the vector table. Override with EMBEDDING_DIMS if needed.
API_KEYS format maps your env var to LiteLLM's expected var (e.g., GOOGLE_API_KEY:key auto-sets GEMINI_API_KEY). Set EMBEDDING_MODEL explicitly for other providers.
MCP Tools
memory — Core memory operations
Action | Required | Optional |
|
|
|
|
|
|
| — |
|
|
|
|
|
| — |
| — | — |
|
|
|
| — | — |
config — Server configuration
Action | Required | Optional |
| — | — |
| — | — |
|
| — |
help — Full documentation
help(topic="memory") # or "config"MCP Resources
URI | Description |
| Database statistics and server status |
| 10 most recently updated memories |
MCP Prompts
Prompt | Parameters | Description |
|
| Generate prompt to save a conversation summary as memory |
|
| Generate prompt to recall relevant memories about a topic |
Architecture
MCP Client (Claude, Cursor, etc.)
|
FastMCP Server
/ | \
memory config help
| | |
MemoryDB Settings docs/
/ \
FTS5 sqlite-vec
|
EmbeddingBackend
/ \
LiteLLM Qwen3 ONNX
| (local CPU)
Gemini / OpenAI / Cohere
Sync: rclone (embedded) -> Google Drive / S3 / ...Development
# Install
uv sync
# Run
uv run mnemo-mcp
# Lint
uv run ruff check src/
uv run ty check src/
# Test
uv run pytestCompatible With
Also by n24q02m
Server | Description | Install |
Notion API for AI agents |
| |
Web search, content extraction, library docs |
| |
Email (IMAP/SMTP) for AI agents |
| |
Godot Engine for AI agents |
|
Related Projects
modalcom-ai-workers — GPU-accelerated AI workers on Modal.com (embedding, reranking)
qwen3-embed — Local embedding/reranking library used by mnemo-mcp
Contributing
See CONTRIBUTING.md
License
MIT - See LICENSE