Mnemozine
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@Mnemozinerecall what we said about FalkorDB setup"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
Mnemozine
A self-hosted unified conversational memory layer. Mnemozine ingests conversations from every AI tool the operator uses (Claude Code, OpenAI-format agents, Hermes), distills them into a temporal knowledge graph (Graphiti on FalkorDB), and serves that memory to every agent through a single MCP server — proactively at session start and on demand mid-session.
The defining constraint: it consolidates rather than accumulates — retrieval precision stays flat as the store grows, because retrieval is always scoped (current project + global preferences + entity neighborhood) instead of searching the whole graph.
See PRD.md for the full specification and
INTERFACES.md for the shared Protocol contracts every module
builds against.
What it is
Layer | What it does | Where |
Ingestion | Normalize Claude Code JSONL transcripts, OpenAI-format gateway turns, and Hermes turns into one common event schema; strip |
|
Typed extraction | Classify each memory unit as |
|
Storage | Graphiti temporal knowledge graph on FalkorDB (graph and vector embeddings in one store); validity windows; scopes ( |
|
Retrieval & delivery | One MCP server exposing |
|
Cross-reference | Surface related |
|
Maintenance | Scheduled consolidate / entity-resolve / decay / audit; 4-way dedup-reinforce-supersede-noop write decision. |
|
Evals | §9 eval harness + gold-set bootstrap + synthetic distractor generator. |
|
Related MCP server: mesh-memory
Architecture
[ Conversation sources ]
Claude Code (JSONL transcripts) OpenAI-format agents Hermes
| | |
| (LiteLLM gateway + capture callback)
v v v
[ 1. Ingestion ] -- normalize to the common event schema; strip tool_calls --
|
v
[ 2. Typed Extraction ] -- classify preference / project_fact / idea_seed --
|
v
[ 3. Storage ] -- Graphiti temporal KG on FalkorDB (graph + bge-m3 vectors) --
|
v
[ 4. Retrieval & Delivery ] -- single MCP server + Claude Code hooks --
|
v
[ 5. Maintenance ] -- dedup, consolidation, decay, entity resolution (scheduled) --Stack (PRD §5.5, pinned in pyproject.toml):
Concern | Choice |
Graph + vector backend | FalkorDB (single store; no Postgres) |
Temporal KG engine | Graphiti — |
Extraction LLM | Pluggable OpenAI-format |
Embedding model | bge-m3 via Ollama, self-hosted (1024-d) |
OpenAI-format gateway | LiteLLM proxy + a custom logging callback |
MCP server | official |
Maintenance scheduler | APScheduler (or a k8s |
Language / packaging | Python ≥3.11, hatchling, |
The whole system runs end-to-end on local models with no cloud dependency.
The extraction/embedding endpoints are pluggable, so the extraction LLM MAY point
at a cloud model later on cost grounds — a one-line base_url/model swap.
Console scripts
Installed by the package (pyproject.toml [project.scripts]):
Script | Purpose |
| the single MCP server (FR-RET-1) |
| source → chunk → extract → store loop (FR-ING-*) |
| scheduled consolidate/resolve/decay/audit (FR-MNT-*) |
| §9 eval harness + gold-set bootstrap |
| Claude Code |
| Claude Code |
| Claude Code |
| Claude Code |
The three service workloads (mnemozine-mcp / -ingest / -maintenance) share
one container image and differ only in the command they run.
Setup
There are two supported deployment paths, sharing one image definition
(deploy/Dockerfile):
docker-compose — local dev / running the eval harness without a cluster.
Helm chart — homelab Kubernetes.
Both are documented in detail in deploy/README.md; the
essentials are below.
Path A — bare-metal dev (Python only)
python -m venv .venv && . .venv/bin/activate
pip install -e ".[dev]"
cp .env.example .env # then edit endpoints/keys
python -c "import mnemozine; print(mnemozine.__version__)"
pytestThis installs the console scripts but assumes you supply FalkorDB, Ollama
(bge-m3), and a Qwen/OpenAI-format endpoint yourself (the .env defaults point
at localhost). For a turnkey stack, use docker-compose.
Path B — docker-compose (local full stack + eval)
# from the repo root
cp .env.example .env # edit endpoints/keys if needed
docker compose -f deploy/docker-compose.yml up -d --buildBrings up every service with no cluster:
Service | Purpose |
| single graph + vector store, persisted to named volume |
| bge-m3 embeddings; |
| local OpenAI-format extraction LLM (llama.cpp server by default), weights in |
| OpenAI-format gateway + logging callback, on |
| the MCP server, published on |
| Claude Code watcher + hooks; mounts |
| scheduled consolidate/resolve/decay/audit |
Inter-service URLs are set under each service's environment: (which overrides
env_file in Compose), so containers reach each other by service name
(redis://falkordb:6379, http://ollama:11434, http://litellm:4000/v1) even
though .env ships localhost defaults for bare-metal dev. Override any of them
with the MZ_COMPOSE_* interpolation vars, e.g.:
MZ_COMPOSE_EXTRACTION_URL=https://api.openai.com/v1 \
MZ_COMPOSE_EXTRACTION_MODEL=openai/gpt-4o-mini \
MZ_COMPOSE_EXTRACTION_API_KEY=sk-... \
docker compose -f deploy/docker-compose.yml up -dLocal Qwen model. The qwen service runs a llama.cpp OpenAI-compatible
server; drop a GGUF into the qwen-models volume (or bind-mount one) and set
QWEN_MODEL to its filename (default qwen2.5-7b-instruct-q4_k_m.gguf). To use a
cloud extraction endpoint instead, point the extraction URL at it (above) and the
qwen service becomes optional.
Claude Code transcripts. mnemozine-ingest mounts the host Claude Code
config dir read-only. Override the host path with HOST_CLAUDE_CONFIG_DIR
(defaults to $HOME/.claude).
Path C — Helm (homelab k8s)
helm lint deploy/helm/mnemozine
helm install mz deploy/helm/mnemozine -n mnemozine --create-namespace
# render without installing:
helm template mz deploy/helm/mnemozineRendered objects:
FalkorDB —
StatefulSet+ headlessService+volumeClaimTemplate(graph + vector persistence at/data).Ollama / Qwen / LiteLLM —
Deployment+Service(+ PVCs for model storage). Ollama pulls bge-m3 via an init container on first start.mcp / ingest / maintenance —
Deployments from the shared image. Maintenance can render as a k8sCronJobinstead (maintenance.asCronJob=true).ConfigMap — all non-secret
MNEMOZINE_*env, including every §6.6 tuning param from.Values.tuning; mounted into every workload viaenvFrom.Secret — FalkorDB password + extraction API key (+
extraSecrets).
When a bundled dependency is enabled, its in-cluster Service DNS is wired
automatically. To use something you run elsewhere, set <dep>.enabled=false and
the matching endpoints.external.*:
helm install mz deploy/helm/mnemozine \
--set falkordb.enabled=false --set endpoints.external.falkordbUrl=redis://my-falkor:6379 \
--set ollama.enabled=false --set endpoints.external.ollamaBaseUrl=http://my-ollama:11434 \
--set litellm.enabled=false --set qwen.enabled=false \
--set endpoints.external.extractionBaseUrl=https://api.openai.com/v1 \
--set extraSecrets.MNEMOZINE_EXTRACTION__API_KEY=sk-...Reach the MCP server in-cluster at
http://<release>-mcp.<namespace>.svc:8765, or port-forward it:
kubectl -n mnemozine port-forward svc/mz-mnemozine-mcp 8765:8765Configuration (environment variables)
All runtime configuration lives in mnemozine/config.py (a
pydantic-settings Settings) and is overridable via environment variables —
prefix MNEMOZINE_, nested delimiter __. The full, authoritative list is
.env.example. Nothing is a hard-coded constant; in particular
the §6.6 tuning parameters are config so they can be calibrated against the eval
set. Setting get_settings() is cached process-wide.
FalkorDB connection (FR-STO-2)
Variable | Default | Meaning |
|
| FalkorDB (Redis protocol) connection URL |
|
| Graphiti graph/keyspace name |
| (unset) | optional FalkorDB/Redis password |
Extraction LLM — pluggable OpenAI-format base_url, default local Qwen (§5.5)
Variable | Default | Meaning |
|
| OpenAI-format base URL (local Qwen by default; swap to a cloud |
|
| LiteLLM |
|
| API key (local servers ignore it) |
|
| extraction wants determinism |
|
| per-request timeout (s) |
Embedding endpoint — bge-m3 via Ollama (OQ3)
Variable | Default | Meaning |
|
| Ollama base URL |
|
| Ollama embedding model |
|
| vector dimensionality (bge-m3 is 1024-d) |
|
| per-request timeout (s) |
Claude Code ingestion — CLAUDE_CONFIG_DIR / cleanupPeriodDays (FR-ING-2/R4)
Variable | Default | Meaning |
|
| root of Claude Code config/transcripts (the |
|
| Claude Code's local-transcript retention ( |
|
| strip |
|
| §6.6 |
|
| §6.6 |
Note on
cleanupPeriodDays: Claude Code deletes local transcripts aftercleanupPeriodDays(default 30). The ingester runs as a near-real-time watcher plusStop/PreCompacthooks so nothing is lost before deletion; you may also raise Claude Code's owncleanupPeriodDaysas a safety net. The mnemozine setting here records that retention window for the ingest layer.
MCP server (FR-RET-1)
Variable | Default | Meaning |
|
| MCP bind host (compose/Helm set |
|
| MCP bind port |
|
| logging level |
§6.6 tuning parameters (config, not constants)
These are deliberately calibrated against the eval set, not guessed. Initial values match the PRD's initial guesses.
Injection budget (FR-RET-3 / FR-RET-5)
Variable | Default | §6.6 |
|
|
|
|
| max top-preference snippets in the index |
Cross-reference engine (FR-RET-6)
Variable | Default | §6.6 |
|
|
|
|
|
|
|
| min cosine sim for the FR-RET-6 vector fallback (distinct from the surfacing threshold) |
Maintenance / dedup / decay (FR-MNT-*)
Variable | Default | §6.6 |
|
|
|
|
|
|
|
|
|
|
| FR-MNT-1 supersede-LLM candidate cap |
|
|
|
|
|
|
|
| scheduled maintenance cadence (FR-MNT-5) |
Retrieval (FR-RET-2)
Variable | Default | §6.6 |
|
|
|
|
| default results per scoped query |
|
| FR-RET-2 entity-neighborhood traversal depth |
In Helm these same knobs live under .Values.tuning (camelCase) and render into
the ConfigMap, e.g.:
helm upgrade mz deploy/helm/mnemozine \
--set tuning.crossref.relevanceThreshold=0.85 \
--set tuning.inject.tokenBudget=400 \
--set tuning.maintenance.cron='0 4 * * *'Registering the Claude Code hooks
Claude Code invokes a hook as a subprocess, passing a JSON payload on stdin
and reading the hook's response (JSON hookSpecificOutput) from stdout. The
four hook entrypoints are installed as console scripts by the package:
Hook event | Script | Does |
|
| inject the compact, ~500-token memory index (FR-RET-3) |
|
| inject finer-grained prompt-scoped memory mid-session (FR-RET-5) |
|
| flush the session's chunk into ingestion at session end (FR-ING-6) |
|
| flush the chunk before compaction (FR-ING-6) |
Register all four in Claude Code's settings.json hooks block. Each entry is
a command-type hook; the four entrypoints read the hook JSON from stdin and
take no command-line arguments, so the command is just the path to the
installed console script (no flags). SessionStart / UserPromptSubmit /
Stop / PreCompact are not tool-matched events, so no matcher is needed.
Drop this into ~/.claude/settings.json (user-global) or a project's
.claude/settings.json. Use the absolute path to the installed scripts —
i.e. the path that which mnemozine-hook-session-start prints inside the
environment where you ran pip install -e . (typically …/.venv/bin/…):
{
"hooks": {
"SessionStart": [
{
"hooks": [
{ "type": "command", "command": "/abs/path/to/.venv/bin/mnemozine-hook-session-start" }
]
}
],
"UserPromptSubmit": [
{
"hooks": [
{ "type": "command", "command": "/abs/path/to/.venv/bin/mnemozine-hook-user-prompt-submit" }
]
}
],
"Stop": [
{
"hooks": [
{ "type": "command", "command": "/abs/path/to/.venv/bin/mnemozine-hook-stop" }
]
}
],
"PreCompact": [
{
"hooks": [
{ "type": "command", "command": "/abs/path/to/.venv/bin/mnemozine-hook-pre-compact" }
]
}
]
}
}If the scripts are on PATH for the shell Claude Code spawns hooks in, you may
use the bare names ("command": "mnemozine-hook-session-start"), but an absolute
path is the robust default since the hook subprocess does not inherit your
interactive shell's activated venv. Resolve the four absolute paths at once with:
for h in session-start user-prompt-submit stop pre-compact; do
command -v "mnemozine-hook-$h"
doneNotes:
The hooks are fail-safe: an empty/invalid payload, an unwired backend, or any internal error yields an empty injection (or no-op flush) rather than raising — a hook must never break the session.
Injected memory is wrapped in
<mnemozine-memory>…</mnemozine-memory>delimiters so the model treats it as advisory background, and is truncated toinject.token_budget(~500 tokens).The hooks call into the same wired retriever + ingest service the
mnemozine-ingestprocess owns; running that daemon installs the loader the hooks use. TheStop/PreCompactflush is idempotent — flushing a session the watcher already tailed is a no-op (de-dup on the FR-ING-5 content hash).
Pointing OpenAI-format agents and Hermes at the gateway
Capture happens through the LiteLLM OpenAI-format gateway with a registered
logging callback. The reference proxy config is
mnemozine/ingestion/gateway/config.yaml
(docker-compose uses deploy/litellm.config.yaml).
OpenAI-format agents (FR-ING-3)
Run the gateway:
litellm --config mnemozine/ingestion/gateway/config.yaml --port 4000(docker-compose / Helm run the
litellmservice for you.) The callback is registered inlitellm_settings.callbacksas the dotted pathmnemozine.ingestion.gateway.litellm_register.gateway_callback.Point any operator-controlled, repointable OpenAI-format agent at the gateway by setting its OpenAI
base_urltohttp://<gateway-host>:4000/v1(and anyapi_keythe proxy expects). Every completion that agent makes is then captured and emitted as common-schema events (source=openai), withtool_callsstripped (FR-ING-7).The gateway's own upstream (the model it proxies to) is the local Qwen by default; swap to a cloud backend by editing the
model_listapi_base/api_key(a single line) — capture still works.
Explicit non-capability (FR-ING-3): third-party apps that cannot be repointed at the gateway
base_url(ChatGPT desktop, Cursor, …) are not captured by this path.
Hermes (FR-ING-4)
Hermes is the self-hosted Nous Research Hermes agent on a homelab VM. Two paths:
Preferred — direct instrumentation. Instrument the VM to push each completed turn into
mnemozine.ingestion.hermes.HermesAdapter(anIngestSource), which normalizes Hermes-native payloads into the common schema (source=hermes), strippingtool_calls. Recorded turns replay viabackfillfor the Phase-1 historical import.Fallback — front it with a gateway. If direct instrumentation is impractical, run a second LiteLLM proxy whose upstream
api_baseis Hermes' OpenAI-compatible endpoint and whose callback referencesmnemozine.ingestion.gateway.litellm_register.hermes_gateway_callback(source=hermes). The Hermes variant is sketched (commented) at the bottom ofgateway/config.yaml.
Reading memory back
All agents — Claude Code and OpenAI/Hermes alike — read from the single MCP
server (mnemozine-mcp). It exposes:
recall(query, scope=None, top_k=10)— on-demand consolidated recall (FR-RET-4).scopeis optional: omit for current project + global, or passglobal/project:<id>/ a bare project id.session_start_index(...)— the FR-RET-3 compact index as a tool (so non-hook agents can request it too).mid_session_index(prompt, project=None)— the FR-RET-5 finer-grained index.
Transports: stdio (Claude Code local default) and streamable-http / sse
(networked OpenAI/Hermes agents), selected with mnemozine-mcp --transport ....
Eval harness and bootstrapping the eval set
The §9 eval harness is the mnemozine-eval console script. It runs offline
against a committed gold-set fixture and a packaged in-memory fake store, so it
needs no FalkorDB/Ollama/Qwen.
mnemozine-eval run # every §9 metric once; exits non-zero on failure
mnemozine-eval run -x 10 # same, with a 10x distractor inflation
mnemozine-eval scaling # headline: injection precision at 1x/10x/100x
mnemozine-eval show-gold # summarize the gold setscaling is the headline §9 assertion — that precision does not decline as
the store is inflated with synthetic plausible-but-irrelevant distractors
(--levels 1,10,100, --tolerance for allowed drop). It exits non-zero if
precision declines.
Bootstrapping the eval set (operator task)
The eval set encodes the operator's own preferences across their own projects, so only the operator can label it (PRD §9 — this is an operator deliverable, ≈40 cases, ~2–3 hrs). Two-step flow:
# 1. Auto-propose extracted candidates and write a Markdown review sheet.
mnemozine-eval bootstrap-propose --out eval_review.md
# 2. Edit eval_review.md by hand: tick "- [x] keep" on candidates to keep,
# optionally correcting the proposed type/scope (human-in-the-loop, R1).
# 3. Fold the labeled sheet into a committed gold set.
mnemozine-eval bootstrap-finish --in eval_review.md --out mnemozine/evals/fixtures/gold_set.jsonbootstrap-finish reads the ticked candidates back, builds a GoldSet (seed
memories + classifier cases), and writes it to the gold-set JSON (default the
committed fixture at mnemozine/evals/fixtures/gold_set.json). Commit that file
and run mnemozine-eval run on every change and on a schedule.
The offline bootstrap-propose uses a tiny demo backlog so the command is
exercisable out of the box; the integration pass can point it at the real
IngestSource.backfill + Extractor to propose from your actual historical
import.
Operations
Maintenance schedule (FR-MNT-5)
Maintenance is a separate, idempotent, repeatable pass (consolidate → resolve entities → decay/archive → audit, in that order):
mnemozine-maintenance run # run the full pass once and exit
mnemozine-maintenance serve # run on the configured cron until interruptedThe cron cadence is
MNEMOZINE_MAINTENANCE__CRON(default0 3 * * *); theservemode uses APScheduler.In docker-compose the
mnemozine-maintenanceservice runsservecontinuously.In Helm it is a long-lived
Deploymentby default; setmaintenance.asCronJob=trueto render a KubernetesCronJob(schedule frommaintenance.cronSchedule, defaulting totuning.maintenance.cron).Each job is isolated — a failure in one is recorded as a note but does not abort the rest of the pass.
Demotion to the archive tier is governed by
decay.archive_after(DECAY_ARCHIVE_AFTER_DAYS, default 90 days unused); the system archives, never hard-deletes by default.
Backing up the FalkorDB volume
FalkorDB is the single source of truth (graph and vectors). Its data lives at
/data:
docker-compose — the named volume
falkordb-data(mounted at/data).Helm — the StatefulSet's
dataPVC (thevolumeClaimTemplate, mounted at/data).
FalkorDB speaks the Redis protocol, so back up the on-disk RDB. Trigger a save then copy the dump out:
# docker-compose — trigger a save, then copy /data out of the falkordb container.
# (The named volume is <project>_falkordb-data; the project name defaults to the
# compose file's directory, so `docker compose ... config --volumes` /
# `docker inspect` resolve the exact volume name if you back it up by volume.)
docker compose -f deploy/docker-compose.yml exec falkordb redis-cli SAVE
docker compose -f deploy/docker-compose.yml cp falkordb:/data ./falkordb-backup-$(date +%F)
# kubernetes (StatefulSet pod <release>-mnemozine-falkordb-0, e.g. mz-mnemozine-falkordb-0)
kubectl -n mnemozine exec mz-mnemozine-falkordb-0 -- redis-cli SAVE
kubectl -n mnemozine cp mz-mnemozine-falkordb-0:/data ./falkordb-backup-$(date +%F)If the FalkorDB password is set, pass -a "$MNEMOZINE_FALKORDB__PASSWORD" to
redis-cli. Restore by stopping FalkorDB, replacing the contents of the volume /
PVC with a backed-up /data, and restarting. Snapshotting the underlying volume
(or PVC VolumeSnapshot) while FalkorDB is quiesced is an equivalent approach.
Superseded/decayed memories are kept (archive tier) rather than deleted, so the store grows slowly over time; size the FalkorDB volume (compose volume / Helm
falkordb.persistence.size, default 10Gi) and Ollama/Qwen model volumes accordingly.
Health checks
mnemozine-mcpexposes an HTTP surface on its bind port; compose/Helm probe it via TCP/HTTP.mnemozine-ingestandmnemozine-maintenancehave no HTTP surface — liveness is "the watcher/scheduler process is still running" (pgrep).
Configuration reference
The single source of truth for config is mnemozine/config.py; the full env-var
list (with the MNEMOZINE_ prefix and __ nesting) is
.env.example. Deployment specifics — image overrides, Helm
values.yaml knobs, the MZ_COMPOSE_* compose overrides — are in
deploy/README.md.
This server cannot be installed
Maintenance
Resources
Unclaimed servers have limited discoverability.
Looking for Admin?
If you are the server author, to access and configure the admin panel.
Latest Blog Posts
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/DiverOfDark/Mnemozine'
If you have feedback or need assistance with the MCP directory API, please join our Discord server