M3 Memory
Optionally uses Ollama to load a small chat model for auto-classification, summarization, and consolidation of memories.
Enables syncing memory data across devices using PostgreSQL, allowing seamless continuation between different machines.
Uses local SQLite databases as the primary storage for memories, chat logs, files, and knowledge graph, ensuring sovereign data storage.
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@M3 Memorysave this conversation about project requirements"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
![M3 Memory]
M3 Memory
M3 treats agent memory as a distributed-systems problem, not a retrieval feature. Instead of every tool (Claude, Gemini, OpenCode, …) keeping its own throwaway memory, M3 is a shared, evolving, bitemporal knowledge base that multiple heterogeneous agents and machines read and write — built to answer "how do agents maintain a consistent, evolving, temporal knowledge base over months and years?" rather than just "how do we retrieve a chunk?"
That framing is what makes the rest different: memory as persistent infrastructure, bitemporal history ("what did we believe last Tuesday, and when was it corrected?"), automatic contradiction management (not just append-and-hope), a memory-first MCP operational API (not a bare store/fetch), and local-first without giving up cross-agent interoperability.
Local-first Memory Framework for AI Agents · 99.2% LongMemEval-S retrieval @ k=10 · Supports Claude · Gemini · Antigravity · OpenCode · OpenClaw · Hermes · MCP-native and plugins · Hybrid search (FTS5 + vector + MMR) · GDPR · FIPS 140-3 deployment-ready · 100% local (fully offline) or cloud capable
In one sentence
M3 is a persistent, local-first memory layer for AI coding agents — a shared, bitemporal knowledge base that multiple agents read and write over MCP.
Works with | Claude Code · Gemini CLI · Aider · Google Antigravity · OpenCode · Hermes · any MCP agent |
M3 is | a memory layer · an MCP server · a hybrid retrieval engine · a bitemporal knowledge base |
M3 is not | an LLM · a chatbot · a plain vector database · a RAG framework · an IDE |
For you if | you use a desktop coding agent and want memory that's private, offline-capable, and shared across tools |
Maturity | production-grade — lightweight by design (SQLite primary), scales out to PostgreSQL for demanding environments; see |
"Wait, you remember that?" — Stop re-explaining your project to your AI. Give it a long-term brain that stays 100% on your machine.
🚀 New to M3? Start here with our 5-minute "Human-First" guide.
Works with Claude Code, Gemini CLI, Aider, Google Antigravity, OpenCode, Hermes Agent, and any MCP-compatible agent. Quick one-line command to have your agent install chat log sub-system which saves verbatim chat log info, before compaction, with zero lag/latency and 100% retrieval recall. Just tell your AI agent "install m3-memory chat log sub-system" and your agent will automatically install it with all the proper hooks with some minimal customization questions from you (you can accept the default answers).
👉 I've read enough, I just want to install it on Windows, macOS, or Linux.
🧠 Memory model at a glance
Not a vector store with RAG sugar — a typed, bitemporal, confidence-scored, self-maintaining knowledge base. Every point below is a first-class column or named function, not a roadmap item (full model →):
Typed & structured — every memory has
type,source,confidence,scope, provenance (change_agent), and salience (importance,decay_rate) — a database of facts, not a transcript.Bitemporal history — separate valid-time and transaction-time, so m3 answers "what did we believe last Tuesday, and when was it corrected?" — superseded facts are closed, not deleted.
Automatic contradiction handling — conflicts are detected on write and the stale fact is superseded (with
corroboration_count/contradiction_count+ a Bayesian confidence posterior), instead of piling up contradictory history.Self-maintaining lifecycle — decay, dedup, consolidation into higher-order
beliefmemories, TTL/expiry, retention, and GDPR erasure.Write-gating — high-signal memories are promoted through an enrichment queue; a content-safety gate rejects injection at the write boundary. Remember fewer things, better.
Explainable, goal-aware retrieval — hybrid (vector + FTS5 BM25 + MMR + rerank), intent-routed by query type, and
memory_suggestreturns the per-result score breakdown (vector / bm25 / recency / title-overlap → final) so you can ask "why did you remember this?" and get numbers. See CONFIDENCE_AND_TRUST.md.Measured, not asserted — LongMemEval-S 92% end-to-end QA, 99.2% recall@10 (report).
Related MCP server: Smriti
📦 Install
curl -fsSL https://raw.githubusercontent.com/skynetcmd/m3-memory/main/install.sh | bashInstalls on macOS or Linux with the single command above. Use this to install on Windows. Use this link to install manually and this to examine the script and what it does.
Claude Code users can also install as a plugin instead — gets you 15 /m3:* slash commands, two curator subagents (m3:curate-memory, m3:curate-chatlog), and auto-wired hooks:
/plugin marketplace add skynetcmd/m3-memory
/plugin install m3@skynetcmdPlugin reference · Claude.ai (web/desktop) connector
Google Antigravity users can install the plugin directly:
agy plugin install https://github.com/skynetcmd/m3-memoryHermes Agent users can install the memory-provider plugin directly (supports optimal replacement of default memory or parallel coexistence for rich SOTA retrieval):
# Handled automatically via our setup wizard:
m3 setupAdd to your MCP config:
{
"mcpServers": {
"memory": { "command": "m3" }
}
}🚀 One-command setup
pip install m3-memory
m3 setupm3 setup is an interactive wizard. It detects every agent on PATH (Claude
Code, Gemini CLI, OpenCode, OpenClaw), asks a handful of questions, then
drives the full install end-to-end: system payload, sovereign CPU embedder
(BGE-M3 on port 8082), per-agent MCP wiring, chatlog hooks, and a brief
doctor health check. Restart your agent — that's it.
🛡️ Sovereign by default
The embedder ships in the repo. Our own BGE-M3 CPU embedder runs as a
small always-on service on 127.0.0.1:8082 after m3 setup. No LM
Studio, no Ollama, no GPU, no internet required for embedding to work.
Embedder path | When it's used | What you do |
Sovereign CPU (port 8082) | Always installed by | Nothing — it's the default. |
GPU in-process | Optional opt-in for ~10-50× faster embedding. CUDA / Vulkan / Metal auto-detected. |
|
External (Ollama, LM Studio, vLLM, …) | Power users who want a different model or shared host service. | Set |
Want auto-classification, summarization, and consolidation? Load a small
chat model for generation (e.g. qwen2.5:0.5b via Ollama, or any 0.5–1B
instruct GGUF). M3 auto-selects it; embedding-only features work without
it. See docs/QUICKSTART.md → Optional: load a small chat model.
⚡ Auto-Oxidation is ON by default. Performance-critical hot paths (MMR rerank, batch cosine, FTS compile, redaction) run on an optional in-process Rust core (
m3_core_rs, a local wheel — no service), with silent pure-Python fallback when it's absent. Micro-benchmarks show large wins where they matter (up to ~846× on packed MMR rerank, ~97–178× on batch-cosine). Full table + methodology: docs/OXIDATION_BENCHMARKS.md. Opt out withM3_CORE_RS_DISABLE=1.
Restart your agent. Done!
🎚️ 100+ tools, but they don't all crowd your context — domain gating keeps the catalog small
M3 exposes 100+ MCP tools so power users can customize at fine granularity — single-id deletes, bulk variants, per-store searches, KG traversals, GDPR primitives, agent handoffs, watch-mode admin, the lot. Most agents never touch most of them in a typical session.
To avoid burning context space on tool schemas you won't use, m3 groups
its catalog into 8 domains (memory, chatlog, files, entity,
agent, tasks, conversations, admin) and loads them lazily.
At MCP startup only the essentials register (6 data tools — memory +
chatlog + files search/write — plus the 4 always-on dispatcher/meta tools);
the rest expose on demand when the agent calls
tools_load_domain(domain="…").
Measured on m3 main with the gpt-4o tokenizer over the serialized tool
schemas ({name, description, parameters} per tool, as registered on the
MCP wire):
Mode | Tools at startup | Tokens at startup | % of 200 K window | % of 256 K window |
Lazy (default) | 10 | ~3,540 | 1.8 % | 1.4 % |
Typical session (lazy + agent loads files + memory) | 64 | ~17,975 | 9.0 % | 7.0 % |
Eager ( | 107 | ~24,918 | 12.5 % | 9.7 % |
For comparison, common alternatives: a 40-tool GitHub MCP server ≈ 12,000 tokens; the full 93-tool GitHub MCP server ≈ 55,000 tokens (MCP Token Counter). m3's lazy default keeps the always-on surface ~7× smaller than the full eager catalog while giving the agent the full tool set whenever it actually needs them.
Disable with M3_TOOLS_LAZY=0 if your client doesn't support
dynamic tool registration
or you want every tool at startup. Direct Python imports
(from memory_bridge import memory_write) always expose every tool —
this only gates the MCP wire surface.
🛡️ Air-gapped deployment
M3 is sovereign by default — the baseline install needs no external services. For fully air-gapped environments, the only extra step is to pre-stage the repo (with the LFS-tracked GGUF materialized) and wheels on a connected machine, then sneakernet the folder and pip install --no-index. No curl, no LM Studio, no third-party model server.
M3 is also FIPS 140-3 deployment-ready: it implements no custom crypto, uses only FIPS-approved algorithms, and routes every operation through a single provider boundary so a validated wolfCrypt module can serve it (M3_FIPS_MODE=1 fails closed if absent). M3 itself is not a CMVP-validated module — no application is.
Full guides: Sovereign & Air-Gapped Deployment · FIPS module boundary & tiers. Config/payload/backups live under ~/.m3-memory (override with M3_MEMORY_ROOT).
🔮 What happens next (benefits of use)
You're at a coffee shop on your MacBook, asking Claude to debug a deployment issue. It remembers the architecture decisions you made last week, the server configs you stored yesterday, and the troubleshooting steps that worked last time — all from local SQLite, no internet required.
Later, you're at your Windows desktop at home with Gemini CLI, and it picks up exactly where you left off. Same memories, same context, same knowledge graph. You didn't copy files, didn't export anything, didn't push to someone else's cloud. Your PostgreSQL sync handled everything in the background the moment your laptop hit the local network.
💡 Why this exists
Most AI agents don't persist state between sessions. You re-paste context, re-explain architecture, re-correct mistakes. When facts change, the agent has no mechanism to update what it "knows."
M3 Memory gives agents a structured, persistent memory layer that handles this.
⚡ What it does
Autonomous cognitive loop — optional background worker (m3_cognitive_loop.py) that extracts facts, resolves contradictions, and links entities while you sleep. Turns raw chat logs into a refined knowledge graph without human intervention.
Persistent memory — facts, decisions, preferences survive across sessions. Stored in local SQLite.
Hybrid retrieval — FTS5 keyword matching + semantic vector similarity + MMR diversity re-ranking. Automatic, no tuning required.
Contradiction handling — conflicting facts are automatically superseded. Bitemporal versioning preserves the full history.
Knowledge graph — related memories linked automatically on write. Nine relationship types, 3-hop traversal. Entity extraction (entity_search, entity_get) supplements the graph with first-class people / places / things resolution. The entity-graph layer ships a stock entity-type and predicate vocabulary, and it's user-configurable: point M3_ENTITY_VOCAB_YAML at your own profile to swap or extend the vocab for your domain — no code changes.
Zero-config local install — pip install m3-memory plus one line in your MCP config, or m3 setup for a one-command wizard that detects agents, wires settings.json + hooks, installs the sovereign CPU embedder, and verifies with a brief doctor check in one shot. SQLite stores everything locally — no external databases, no cloud calls, no API costs. Works offline.
Context-frugal tool catalog — 100+ MCP tools grouped into 8 domains, loaded lazily. Startup surface is ~3,540 tokens (~1.8% of a 200K window) vs ~24,918 if every tool registered eagerly. Agent expands a domain when it needs the rest. See § 100+ tools, domain-gated.
Cross-device sync — optional, easy-to-add bi-directional delta sync via PostgreSQL or ChromaDB, with manifest-driven multi-DB support for fleet deployments. Set one environment variable and your memories follow you across machines.
📚 Learn more
🛡️ Compliance & assurance (FISMA, CMMC, GDPR) | |
🔍 Myths & facts (verify claims about M3) | |
🗺️ Roadmap | |
🛠️ Operations playbook (run the memory brain) | 🧩 Capability matrix (every tool, grouped) |
🤖 AI agent context profile (inject into other LLMs) | 🔢 Machine-readable features ( |
🎯 Who this is for
M3 is a good fit if…
🤖 You use coding agents | Claude Code, Gemini CLI, Aider, OpenCode, or any MCP-compatible agent. Non-MCP clients work too via the built-in HTTP proxy. |
👥 You run multiple agents | Coordinating Claude + Gemini + a background worker on a shared local store, with handoffs and per-agent scoping. |
🛡️ You need compliance primitives |
|
💾 You want pure local-first | Single-file SQLite. Works offline. No external database, no cloud calls, no API costs by default. |
🌐 You want memory across devices | Optional bi-directional delta sync via PostgreSQL or ChromaDB — your data, your hardware. |
M3 is not the right tool if…
Try instead | |
You're building LangChain / LangGraph / CrewAI pipelines and want framework-native memory | |
You want a hosted agent runtime with managed scaling, dashboards, and SLAs | |
You want a fully managed, hosted retrieval service and don't need local-first / sovereignty | |
You only need in-session chat context that's discarded after the conversation | Your agent's built-in conversation buffer; M3 is overkill |
🛡️ Why trust this
100+ MCP tools | Memory, search, GDPR, refresh lifecycle — plus agent registry, handoffs, notifications, tasks, entity graph, fact enrichment, chat-log capture, and a 26-tool files-memory layer (directory ingestion, hierarchical chunking, ascension to core memory, watch-mode staleness review) |
563 end-to-end tests | Covering write, search, contradiction, sync, GDPR, maintenance, orchestration, and the files-memory pipeline |
Explainable retrieval |
|
SQLite core | No external database required. Single-file, portable, inspectable |
GDPR compliance |
|
Self-maintaining | Automatic decay, dedup, orphan pruning, retention enforcement |
Audited security posture | Periodic Bandit + pip-audit + secrets-scan reports published under |
Apache 2.0 licensed | Free. No SaaS tier, no usage limits, no lock-in |
🧭 Maturity. The core — storage, retrieval, GDPR, MCP tools, sync — is stable and fully covered by the test suite. The enrichment + reflector pipeline shipped through 2026-Q2 with live-fire experience behind it and gets sharper with every release. M3 runs in production today — a durable memory substrate for personal, homelab, and multi-agent developer workflows, from a single laptop to a fleet of heterogeneous agents sharing one evolving knowledge base.
Built privacy-focused from the ground up. For regulated environments, M3 ships with first-class compliance primitives rather than bolting them on:
GDPR —
gdpr_forget(Article 17, right to erasure) andgdpr_export(Article 20, data portability) as built-in MCP tools.FIPS 140-3 deployment-ready crypto boundary — AES-256-GCM secrets vault, PBKDF2-HMAC-SHA256 key derivation, and TLS 1.3 with FIPS-approved ciphersuites, all routed through a single provider boundary. The crypto provider is obtained separately, not bundled: point it at the CMVP-validated wolfSSL FIPS module (under
M3_FIPS_STRICT) for a validated deployment, or use the open-source wolfCrypt build for everything else — the validation belongs to that module, not to M3.Bitemporal audit log — valid-time and transaction-time captured on every write, backed by a tamper-evident hash chain.
Air-gap operability — no network listeners, no telemetry, no implicit egress.
Framework alignment — mapped to NIST SP 800-53 (FISMA) and CMMC 2.0 / NIST SP 800-171.
M3 is an application, not a validated cryptographic module or a certified system — the certificate and the ATO belong to your deployment. Evaluate it against your specific requirements first, as you should any memory tool. See docs/COMPLIANCE.md and docs/FIPS_MODULE_BOUNDARY.md for the precise boundary, and docs/MYTHS_AND_FACTS.md for where we draw the line.
📊 Benchmarks
Session Hit-Rate @ k (retrieval-only)
k | SHR | Hits / 500 | vs. prior report |
5 | 98.2% | 491 / 500 | +2.0pp (was 96.2%) |
10 | 99.2% | 496 / 500 | +2.4pp (was 96.8%) |
20 | 100.0% | 500 / 500 | first time reported |
k=10 is M3's default search depth — every column above uses the same engine settings the production memory_search tool ships with.
Binary per-question SHR (recall_any@k) — same convention the adjacent LongMemEval-S submissions report as "R@k". Measured on longmemeval_s_cleaned.json (500 questions), no oracle metadata, BGE-M3 hybrid retrieval (FTS5 + vector + MMR). Deterministic at T=0; reproducibility variance <0.1pp.
End-to-End QA Accuracy
92.0% on LongMemEval-S (460/500 correct) — a 500-question evaluation of long-horizon conversational memory — with no oracle metadata (routing inferred from the question text at runtime). Answer model: Claude Opus 4.6; judge: gpt-4o (unmodified upstream).
Question type | n | Accuracy |
single-session-user | 70 | 94.3% |
single-session-assistant | 56 | 96.4% |
single-session-preference | 30 | 80.0% |
multi-session | 133 | 87.2% |
temporal-reasoning | 133 | 95.5% |
knowledge-update | 78 | 93.6% |
Overall | 500 | 92.0% |
The retrieval-vs-QA gap (100% SHR @ k=20 → 92.0% QA) is dominated by answer-model errors on already-retrieved gold evidence, not retrieval misses.
Full methodology, ablations, and honest caveats: benchmarks/longmemeval/LME-S_Benchmarking_Report.md. (An earlier oracle-routed configuration scored 89.0%.)
Discussion threads: xiaowu0162/LongMemEval#49 (v3, no-oracle) and #43 (v1).
LoCoMo audit pending — see benchmarks/locomo/README.md.
🔍 Verifying claims about M3. If a third-party AI assistant has described M3 with features or scores that don't match what's documented here, it's almost certainly hallucinating. See
docs/MYTHS_AND_FACTS.mdfor the source-of-truth list of what M3 actually implements (and what it doesn't).
🧰 Core tools
Most sessions use three tools. The rest is there when you need it.
Tool | Purpose |
| Store a fact, decision, preference, config, or observation |
| Retrieve relevant memories (hybrid search) |
| Refine existing knowledge |
| Search with full score breakdown |
| Fetch a specific memory by ID |
All tools are documented in docs/AGENT_INSTRUCTIONS.md and the full inventory lives in docs/MCP_TOOLS.md.
🤖 For AI agents
M3 Memory exposes 100+ MCP tools for storing, searching, updating, and linking knowledge — including conversation grouping, a refresh lifecycle for aging memories, agent registry, handoffs, notifications, tasks, entity-graph extraction, fact enrichment, chat-log capture for multi-agent orchestration, and a files-memory layer that ingests entire directories (markdown, PDF, plain text) into a hierarchical store with hybrid search, fact extraction, ascension to core memory, and watch-mode staleness review. Any MCP-compatible agent can use them automatically.
To teach your agent best practices (search before answering, write aggressively, update instead of duplicating), drop the compact rules file into your project:
examples/AGENT_RULES.mdFull tool reference with all parameters and behaviors: docs/AGENT_INSTRUCTIONS.md
🪄 Let your agent install it
Already inside Claude Code or Gemini CLI? Paste one of these prompts:
Claude Code:
Install m3-memory for persistent memory. Run: pip install m3-memory
Then add {"mcpServers":{"memory":{"command":"m3"}}} to my
~/.claude/settings.json under "mcpServers". For best retrieval, ensure
Ollama is running with qwen3-embedding:0.6b (optional, falls back
to keyword search without it). Then use /mcp to verify the memory server loaded.Gemini CLI:
Install m3-memory for persistent memory. Run: pip install m3-memory
Then add {"mcpServers":{"memory":{"command":"m3"}}} to my
~/.gemini/settings.json under "mcpServers". For best retrieval, ensure
Ollama is running with qwen3-embedding:0.6b (optional, falls back
to keyword search without it).After install, test it:
Write a memory: "M3 Memory installed successfully on [today's date]"
Then search for: "M3 install"Add the chat log subsystem
Want auto-capture of every Claude Code / Gemini CLI / OpenCode / Aider conversation into a searchable, promotable chat log store? Once m3-memory is wired up, just say:
Install the m3-memory chat log subsystem.The agent runs bin/chatlog_init.py, wires the host-agent hook, and installs the embed sweeper schedule. See docs/CHATLOG.md for the architecture and ops guide.
🎬 See it in action
Contradiction detection
Hybrid search with scores
Cross-device, cross-platform sync
💬 Community
Contributing · Good first issues
Maintenance
Resources
Unclaimed servers have limited discoverability.
Looking for Admin?
If you are the server author, to access and configure the admin panel.
Latest Blog Posts
- Your AI Chatbot Just Exposed Your CEO's Salary to an InternBy Om-Shree-0709 on .Agent IdentityMCP SecurityOAuth Delegation
- Why MCP Servers Need Execution Sandboxing (And Why Your Current Stack Isn't Enough)By Om-Shree-0709 on .Agentic AiPrompt InjectionWebAssembly
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/skynetcmd/m3-memory'
If you have feedback or need assistance with the MCP directory API, please join our Discord server