Memora
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@Memorawhat have we discussed about this project?"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
Memora
Persistent, semantic memory for AI coding agents — local-first, MCP-native.
A local, persistent, semantically-aware knowledge graph for Claude Code (and any MCP-compatible AI coding agent). Auto-loads in every session in every project. Zero per-project setup, zero network calls at runtime, ~85–95% lower per-session context cost than naïve "load it all" memory.
git clone https://github.com/VnemAIDev/memora.git
cd memora
./install.sh --bootstrapThat's the full install. Open Claude Code in any directory; memory auto-loads. See QUICKSTART.md for prerequisites and troubleshooting.
At a glance
Install root:
~/.claude-memory/Database:
~/.claude-memory/graph.db(SQLite, WAL mode)Total disk: ~320 MB (venv 227 MB + ONNX model 90 MB + DB ~2 MB)
Registered scope: user-level MCP server (
claude mcp list→memory)Global protocol:
~/.claude/CLAUDE.mdRuntime network calls: zero (model is downloaded once at install time)
Related MCP server: GraphHub
Quick start (for a fresh reader)
# Verify the server is registered and reachable
claude mcp list
# Check semantic coverage
~/.claude-memory/.venv/bin/python -c "
import sys; sys.path.insert(0,'$HOME/.claude-memory')
import embeddings; print(embeddings.status())"
# Inspect the DB directly
sqlite3 ~/.claude-memory/graph.db \
"SELECT project, COUNT(*) FROM entities GROUP BY project;"
# Periodic maintenance (dedupe observations)
~/.claude-memory/.venv/bin/python ~/.claude-memory/compact.py --aggressive --semantic 0.92File inventory
File | Purpose |
| FastMCP server, exposes 12 tools over stdio |
| Lazy-loaded MiniLM-L6-v2 ONNX embedder (fastembed) |
| One-shot: downloads model + embeds all observations |
| Dedupe script ( |
| Venv-activating launcher (registered with Claude Code) |
| SQLite knowledge graph |
| Server logs |
| ONNX model cache (populated by bootstrap) |
Build timeline — 4 phases
Phase 1 — Base infrastructure
Minimal MCP server matching the original spec.
Schema: 4 tables (
entities,observations,relations,tags) + 7 indexes, WAL mode, foreign keysTools (8):
recall_context,create_entity,add_observation,create_relation,search,get_entity,list_projects,forgetProject auto-detection: CWD basename → project name (
$HOMEand/→"global")Registration:
claude mcp add --scope user memory -- ~/.claude-memory/run.sh
Phase 2 — First token-optimization pass (7 wins)
Targeted the biggest pain point: recall_context() returning ~8 KB of mostly-redundant JSON.
Win | What changed |
Lean default JSON shape | Drop IDs/timestamps/indent; relations become |
| Hard cap with |
| One-line gist replaces raw observations on long entities |
| Stale entities excluded by default |
| Only entities updated within last N days |
FTS5 virtual table + triggers | Real ranked text search instead of |
| Manual + |
New | One-line digest per entity — cheapest possible "what's in here?" |
Phase 3 — External research pass (Caveman / RTK / Supermemory)
Researched 3 token-optimization projects in parallel; ported the high-ROI ideas.
Win | Source | What it does |
Type-tier ordering | RTK | Decisions/conventions/services kept first when budget trims |
| RTK | Truncated response says exactly what to |
Cross-project search via | Supermemory | One call hits all your projects |
Type-tier grouping in | Supermemory | Stable concepts above ephemeral work |
Cross-entity dedup ( | RTK | Repeated obs returned once, referenced thereafter |
New | Supermemory | Lists entities >N obs without a summary — actionable backlog |
Phase 4 — Semantic layer (Supermemory's biggest idea)
Hybrid lexical + semantic search, all local.
Win | Implementation |
| Lazy-loaded MiniLM-L6-v2 via fastembed + ONNX, L2-normalized |
| 384-dim float32 vector per observation (~1.5 KB each) |
Hybrid | FTS5 BM25 + cosine top-K, fused via Reciprocal Rank Fusion (k=60) |
Semantic dedup |
|
| Batch semantic dedup across whole DB |
New | Diagnostic — model availability + coverage % |
| One-shot: download 90 MB model + embed all observations |
Final coverage | 620 / 620 observations embedded |
The complete tool surface — 12 tools
recall_context create_entity add_observation
create_relation search get_entity
list_projects forget set_summary
summarize_project embedding_status flag_for_summaryMeasured token savings (real project data)
Numbers from the actual smoke tests during the build, on the demo project project
with 5 entities and 20 observations:
Call type | Bytes returned | vs. legacy verbose |
| 8,393 | — |
| 3,378 | −60% |
| 2,715 | −68% |
| 395 | −95% |
| 477 | −94% |
| 464 | −94% (hard cap honored) |
At a glance: characters ÷ 4 ≈ tokens. Old startup recall cost ~2,100 tokens;
the new ritual (summarize_project → selective get_entity) costs ~100-400 tokens
depending on what's relevant. 5-20× reduction per session start.
Operational benefits — behavioral wins that compound
Before | After |
Memory file rewritten end-to-end every session via /memory | Persistent SQLite — only deltas written, never the whole file |
Per-project memory configured manually | CWD basename auto-detects project; works in every dir without setup |
Memory loaded only when Claude noticed | Auto-loaded via user-scope MCP + CLAUDE.md protocol nudge |
Naïve recall returned full payload regardless of project size | Type-tier ordering keeps decisions/conventions; budget caps the rest |
Searches missed concept-level queries ("auth" missing | Hybrid lexical + semantic — finds entities by meaning |
Repeated observations bloated context | Cross-entity dedup + semantic dedup at write-time |
No way to know what's in memory without paying full cost |
|
Cross-project knowledge invisible from another project |
|
Continuous wins
No per-project setup cost. Every new project already has full memory. Zero friction.
Cross-session continuity. State that previously lived in fragile
MEMORY.mdfiles now lives in a queryable DB.Type-aware retrieval. Asking "what conventions apply here?" returns conventions first.
Semantic recall. Don't have to remember exact words. "The thing about animation" finds entities tagged
#heroeven if "animation" isn't in any observation.Cheap upkeep.
compact.py --aggressive --semantic 0.92weekly keeps the DB lean.Privacy. All-local. Model cached. No telemetry. No data leaves the machine.
Cost accounting
Cost | Amount |
Disk usage | ~320 MB ( |
Per-call CPU | <50 ms for lean recall; <100 ms for semantic search on 620 obs |
Network at runtime | Zero |
Lock-in | Zero — schema is plain SQLite, inspect anytime with |
Bootstrap dependency | One-time ~90 MB download from HuggingFace (qdrant/all-MiniLM-L6-v2-onnx) |
Memory Protocol (what Claude is told to do, from ~/.claude/CLAUDE.md)
Session start ritual
summarize_project()— one line per entity, ~95% cheaper than full recallflag_for_summary()— for any entity with >5 observations and no summary, callset_summary(name, "<gist>")to short-circuit raw observations on future recallsrecall_context()for the active subset, orget_entity(name)for specific entities
What to write
decision + rationale (
entity_type="decision")file purpose / important state (
entity_type="file")pending task / blocker (
entity_type="todo")user preference / convention (
entity_type="convention")external service / API / dependency (
entity_type="service")person / team member (
entity_type="person")
Active-voice relations only: uses, depends_on, blocks, replaces, owns, reports_to, calls, extends.
Keep tokens low
Default lean shape; pass
verbose=Trueonly when neededRead
omitted_namesfrom truncated responses; useget_entity()for specifics@dup:<name>in observations means deduplicated, not missingTag stale entities
archivedfor auto-exclusionUse
since_days=Nfor recent-only contextUse
search(query, project="*")for cross-project lookupsRun
compact.py --aggressive --semantic 0.92periodically
Safety
Never store secrets, API keys, passwords, or PII. Reference them by name only (e.g., "uses Stripe API key stored in 1Password as STRIPE_PROD").
Maintenance commands
# Check semantic search status
~/.claude-memory/.venv/bin/python -c "
import sys; sys.path.insert(0,'$HOME/.claude-memory')
import embeddings; print(embeddings.status())"
# Re-embed every observation (after model swap)
~/.claude-memory/.venv/bin/python ~/.claude-memory/bootstrap_embeddings.py --rebuild
# Dry-run dedupe (no writes)
~/.claude-memory/.venv/bin/python ~/.claude-memory/compact.py --dry-run --aggressive --semantic 0.92
# Real dedupe + VACUUM
~/.claude-memory/.venv/bin/python ~/.claude-memory/compact.py --aggressive --semantic 0.92
# Health check
claude mcp list
# Activity log
grep "memory server starting" ~/.claude-memory/server.log
# Errors
grep -E "ERROR|failed|Traceback" ~/.claude-memory/server.log
# Direct DB stats
sqlite3 ~/.claude-memory/graph.db "
SELECT project, COUNT(*) AS entities FROM entities GROUP BY project;
SELECT 'total observations: ' || COUNT(*) FROM observations;
SELECT 'embedded observations: ' || COUNT(*) FROM observations WHERE embedding IS NOT NULL;
SELECT 'total relations: ' || COUNT(*) FROM relations;
"What this means in practice
A typical session before this work would either (a) ignore memory and re-ask questions Claude should know the answer to, or (b) load 2-8 KB of memory tokens at session start whether useful or not, possibly missing concept-level matches when searching.
Now: Claude opens the session, calls summarize_project() for ~100 tokens,
scans for what's relevant, calls flag_for_summary() to spot bloated entities,
then fetches detail only where it matters. Median session-start memory cost:
300-800 tokens, down from 2,000+ tokens. Over 100 sessions a month
that's 120K+ tokens saved on memory loading alone — and that ignores the
bigger win of finding context lexical search would have missed entirely.
This server cannot be installed
Maintenance
Resources
Unclaimed servers have limited discoverability.
Looking for Admin?
If you are the server author, to access and configure the admin panel.
Latest Blog Posts
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/VnemAIDev/memora'
If you have feedback or need assistance with the MCP directory API, please join our Discord server