Skip to main content
Glama
n24q02m

Mnemo - Persistent AI Memory

Mnemo MCP Server

mcp-name: io.github.n24q02m/mnemo-mcp

Persistent AI memory with hybrid search and embedded sync. Open, free, unlimited.

CI codecov PyPI Docker License: MIT

Python SQLite MCP semantic-release Renovate

Features

  • Hybrid search: FTS5 full-text + sqlite-vec semantic + Qwen3-Embedding-0.6B (built-in)

  • Zero config mode: Works out of the box — local embedding, no API keys needed

  • Auto-detect embedding: Set API_KEYS for cloud embedding, auto-fallback to local

  • Embedded sync: rclone auto-downloaded and managed as subprocess

  • Multi-machine: JSONL-based merge sync via rclone (Google Drive, S3, etc.)

  • Proactive memory: Tool descriptions guide AI to save preferences, decisions, facts

Related MCP server: WET - Web Extended Toolkit

Quick Start

The recommended way to run this server is via uvx:

uvx mnemo-mcp@latest

Alternatively, you can use pipx run mnemo-mcp.

{
  "mcpServers": {
    "mnemo": {
      "command": "uvx",
      "args": ["mnemo-mcp@latest"],
      "env": {
        // -- optional: LiteLLM Proxy (production, selfhosted gateway)
        // "LITELLM_PROXY_URL": "http://10.0.0.20:4000",
        // "LITELLM_PROXY_KEY": "sk-your-virtual-key",
        // -- optional: cloud embedding (Gemini > OpenAI > Cohere) for semantic search
        // -- without this, uses built-in local Qwen3-Embedding-0.6B (ONNX, CPU)
        // -- first run downloads ~570MB model, cached for subsequent runs
        "API_KEYS": "GOOGLE_API_KEY:AIza...",
        // -- optional: custom embedding endpoint (e.g. modalcom-ai-workers on Modal.com)
        // "EMBEDDING_API_BASE": "https://your-worker.modal.run",
        // "EMBEDDING_API_KEY": "your-key",
        // -- optional: sync memories across machines via rclone
        // -- on first sync, a browser opens for OAuth (auto, no manual setup)
        "SYNC_ENABLED": "true",                    // optional, default: false
        "SYNC_INTERVAL": "300"                     // optional, auto-sync every 5min (0 = manual only)
        // "SYNC_REMOTE": "gdrive",                 // optional, default: gdrive
        // "SYNC_PROVIDER": "drive",                // optional, default: drive (Google Drive)
      }
    }
  }
}

Option 2: Docker

{
  "mcpServers": {
    "mnemo": {
      "command": "docker",
      "args": [
        "run", "-i", "--rm",
        "--name", "mcp-mnemo",
        "-v", "mnemo-data:/data",                  // persists memories across restarts
        "-e", "LITELLM_PROXY_URL",                 // optional: pass-through from env below
        "-e", "LITELLM_PROXY_KEY",                 // optional: pass-through from env below
        "-e", "API_KEYS",                          // optional: pass-through from env below
        "-e", "EMBEDDING_API_BASE",                // optional: pass-through from env below
        "-e", "EMBEDDING_API_KEY",                 // optional: pass-through from env below
        "-e", "SYNC_ENABLED",                      // optional: pass-through from env below
        "-e", "SYNC_INTERVAL",                     // optional: pass-through from env below
        "n24q02m/mnemo-mcp:latest"
      ],
      "env": {
        // -- optional: LiteLLM Proxy (production, selfhosted gateway)
        // "LITELLM_PROXY_URL": "http://10.0.0.20:4000",
        // "LITELLM_PROXY_KEY": "sk-your-virtual-key",
        // -- optional: cloud embedding (Gemini > OpenAI > Cohere) for semantic search
        // -- without this, uses built-in local Qwen3-Embedding-0.6B (ONNX, CPU)
        "API_KEYS": "GOOGLE_API_KEY:AIza...",
        // -- optional: custom embedding endpoint (e.g. modalcom-ai-workers on Modal.com)
        // "EMBEDDING_API_BASE": "https://your-worker.modal.run",
        // "EMBEDDING_API_KEY": "your-key",
        // -- optional: sync memories across machines via rclone
        "SYNC_ENABLED": "true",                    // optional, default: false
        "SYNC_INTERVAL": "300"                     // optional, auto-sync every 5min (0 = manual only)
      }
    }
  }
}

Pre-install (optional)

Pre-download dependencies before adding to your MCP client config. This avoids slow first-run startup:

# Pre-download embedding model (~570MB) and validate API keys
uvx mnemo-mcp warmup

# With cloud embedding (validates API key, skips local download if cloud works)
API_KEYS="GOOGLE_API_KEY:AIza..." uvx mnemo-mcp warmup

Sync setup

Sync is fully automatic. Just set SYNC_ENABLED=true and the server handles everything:

  1. First sync: rclone is auto-downloaded, a browser opens for OAuth authentication

  2. Token saved: OAuth token is stored locally at ~/.mnemo-mcp/tokens/ (600 permissions)

  3. Subsequent runs: Token is loaded automatically — no manual steps needed

For non-Google Drive providers, set SYNC_PROVIDER and SYNC_REMOTE:

{
  "SYNC_ENABLED": "true",
  "SYNC_PROVIDER": "dropbox",        // rclone provider type
  "SYNC_REMOTE": "dropbox"           // rclone remote name
}

Advanced: You can also run uvx mnemo-mcp setup-sync drive to pre-authenticate before first use, but this is optional.

Configuration

Variable

Default

Description

DB_PATH

~/.mnemo-mcp/memories.db

Database location

LITELLM_PROXY_URL

LiteLLM Proxy URL (e.g. http://10.0.0.20:4000). Enables proxy mode

LITELLM_PROXY_KEY

LiteLLM Proxy virtual key (e.g. sk-...)

API_KEYS

API keys (ENV:key,ENV:key). Optional: enables semantic search (SDK mode)

EMBEDDING_API_BASE

Custom embedding endpoint URL (optional, for SDK mode)

EMBEDDING_API_KEY

Custom embedding endpoint key (optional)

EMBEDDING_BACKEND

(auto-detect)

litellm (cloud API) or local (Qwen3). Auto: API_KEYS -> litellm, else local (always available)

EMBEDDING_MODEL

auto-detect

LiteLLM model name (optional)

EMBEDDING_DIMS

0 (auto=768)

Embedding dimensions (0 = auto-detect, default 768)

SYNC_ENABLED

false

Enable rclone sync

SYNC_PROVIDER

drive

rclone provider type (drive, dropbox, s3, etc.)

SYNC_REMOTE

gdrive

rclone remote name

SYNC_FOLDER

mnemo-mcp

Remote folder

SYNC_INTERVAL

300

Auto-sync seconds (0=manual)

LOG_LEVEL

INFO

Log level

Embedding (3-Mode Architecture)

Embedding is always available — a local model is built-in and requires no configuration.

Embedding access supports 3 modes, resolved by priority:

Priority

Mode

Config

Use case

1

Proxy

LITELLM_PROXY_URL + LITELLM_PROXY_KEY

Production (OCI VM, selfhosted gateway)

2

SDK

API_KEYS or EMBEDDING_API_BASE

Dev/local with direct API access

3

Local

Nothing needed

Offline, always available as fallback

No cross-mode fallback — if proxy is configured but unreachable, calls fail (no silent fallback to direct API).

  • Local mode: Qwen3-Embedding-0.6B, always available with zero config.

  • GPU auto-detection: If GPU is available (CUDA/DirectML) and llama-cpp-python is installed, automatically uses GGUF model (~480MB) instead of ONNX (~570MB) for better performance.

  • All embeddings stored at 768 dims (default). Switching providers never breaks the vector table.

  • Override with EMBEDDING_BACKEND=local to force local even with API keys.

API_KEYS supports multiple providers in a single string:

API_KEYS=GOOGLE_API_KEY:AIza...,OPENAI_API_KEY:sk-...,COHERE_API_KEY:co-...

Cloud embedding providers (auto-detected from API_KEYS, priority order):

Priority

Env Var (LiteLLM)

Model

Native Dims

Stored

1

GEMINI_API_KEY

gemini/gemini-embedding-001

3072

768

2

OPENAI_API_KEY

text-embedding-3-large

3072

768

3

COHERE_API_KEY

embed-multilingual-v3.0

1024

768

All embeddings are truncated to 768 dims (default) for storage. This ensures switching models never breaks the vector table. Override with EMBEDDING_DIMS if needed.

API_KEYS format maps your env var to LiteLLM's expected var (e.g., GOOGLE_API_KEY:key auto-sets GEMINI_API_KEY). Set EMBEDDING_MODEL explicitly for other providers.

MCP Tools

memory — Core memory operations

Action

Required

Optional

add

content

category, tags

search

query

category, tags, limit

list

category, limit

update

memory_id

content, category, tags

delete

memory_id

export

import

data (JSONL)

mode (merge/replace)

stats

config — Server configuration

Action

Required

Optional

status

sync

set

key, value

help — Full documentation

help(topic="memory")  # or "config"

MCP Resources

URI

Description

mnemo://stats

Database statistics and server status

mnemo://recent

10 most recently updated memories

MCP Prompts

Prompt

Parameters

Description

save_summary

summary

Generate prompt to save a conversation summary as memory

recall_context

topic

Generate prompt to recall relevant memories about a topic

Architecture

                  MCP Client (Claude, Cursor, etc.)
                         |
                    FastMCP Server
                   /      |       \
             memory    config    help
                |         |        |
            MemoryDB   Settings  docs/
            /     \
        FTS5    sqlite-vec
                    |
              EmbeddingBackend
              /            \
         LiteLLM        Qwen3 ONNX
            |           (local CPU)
  Gemini / OpenAI / Cohere

        Sync: rclone (embedded) -> Google Drive / S3 / ...

Development

# Install
uv sync

# Run
uv run mnemo-mcp

# Lint
uv run ruff check src/
uv run ty check src/

# Test
uv run pytest

Compatible With

Claude Desktop Claude Code Cursor VS Code Copilot Antigravity Gemini CLI OpenAI Codex OpenCode

Also by n24q02m

Server

Description

Install

better-notion-mcp

Notion API for AI agents

npx -y @n24q02m/better-notion-mcp@latest

wet-mcp

Web search, content extraction, library docs

uvx --python 3.13 wet-mcp@latest

better-email-mcp

Email (IMAP/SMTP) for AI agents

npx -y @n24q02m/better-email-mcp@latest

better-godot-mcp

Godot Engine for AI agents

npx -y @n24q02m/better-godot-mcp@latest

  • modalcom-ai-workers — GPU-accelerated AI workers on Modal.com (embedding, reranking)

  • qwen3-embed — Local embedding/reranking library used by mnemo-mcp

Contributing

See CONTRIBUTING.md

License

MIT - See LICENSE

Install Server
A
license - permissive license
A
quality
A
maintenance

Maintenance

Maintainers
5hResponse time
1dRelease cycle
108Releases (12mo)
Issues opened vs closed

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/n24q02m/mnemo-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server