1. Click on "Install Server". 2. Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state. 3. In the chat, type @ followed by the MCP server name and your instructions, e.g., "@Memora what have we discussed about this project?" That's it! The server will respond to your query, and you can continue using it as needed. Here is a step-by-step guide with screenshots.

Memora

by VnemAIDev

Overview Schema Related Servers Score Discussions

Python

Local

Memora

Persistent, semantic memory for AI coding agents — local-first, MCP-native.

License: MIT Python Status MCP

A local, persistent, semantically-aware knowledge graph for Claude Code (and any MCP-compatible AI coding agent). Auto-loads in every session in every project. Zero per-project setup, zero network calls at runtime, ~85–95% lower per-session context cost than naïve "load it all" memory.

git clone https://github.com/VnemAIDev/memora.git
cd memora
./install.sh --bootstrap

That's the full install. Open Claude Code in any directory; memory auto-loads. See QUICKSTART.md for prerequisites and troubleshooting.

At a glance

Install root: ~/.claude-memory/
Database: ~/.claude-memory/graph.db (SQLite, WAL mode)
Total disk: ~320 MB (venv 227 MB + ONNX model 90 MB + DB ~2 MB)
Registered scope: user-level MCP server (claude mcp list → memory)
Global protocol: ~/.claude/CLAUDE.md
Runtime network calls: zero (model is downloaded once at install time)

Related MCP server: GraphHub

Quick start (for a fresh reader)

# Verify the server is registered and reachable
claude mcp list

# Check semantic coverage
~/.claude-memory/.venv/bin/python -c "
import sys; sys.path.insert(0,'$HOME/.claude-memory')
import embeddings; print(embeddings.status())"

# Inspect the DB directly
sqlite3 ~/.claude-memory/graph.db \
  "SELECT project, COUNT(*) FROM entities GROUP BY project;"

# Periodic maintenance (dedupe observations)
~/.claude-memory/.venv/bin/python ~/.claude-memory/compact.py --aggressive --semantic 0.92

File inventory

File	Purpose
`server.py`	FastMCP server, exposes 12 tools over stdio
`embeddings.py`	Lazy-loaded MiniLM-L6-v2 ONNX embedder (fastembed)
`bootstrap_embeddings.py`	One-shot: downloads model + embeds all observations
`compact.py`	Dedupe script (`--aggressive`, `--semantic THRESHOLD`)
`run.sh`	Venv-activating launcher (registered with Claude Code)
`graph.db`	SQLite knowledge graph
`server.log`	Server logs
`models/`	ONNX model cache (populated by bootstrap)

Build timeline — 4 phases

Phase 1 — Base infrastructure

Minimal MCP server matching the original spec.

Schema: 4 tables (entities, observations, relations, tags) + 7 indexes, WAL mode, foreign keys
Tools (8): recall_context, create_entity, add_observation, create_relation, search, get_entity, list_projects, forget
Project auto-detection: CWD basename → project name ($HOME and / → "global")
Registration: claude mcp add --scope user memory -- ~/.claude-memory/run.sh

Phase 2 — First token-optimization pass (7 wins)

Targeted the biggest pain point: recall_context() returning ~8 KB of mostly-redundant JSON.

Win	What changed
Lean default JSON shape	Drop IDs/timestamps/indent; relations become `[from, type, to]` triples
`max_chars` budget	Hard cap with `truncated: true` flag and `omitted` count
`summary` column + `set_summary` tool	One-line gist replaces raw observations on long entities
`archived` tag auto-exclusion	Stale entities excluded by default
`since_days` filter	Only entities updated within last N days
FTS5 virtual table + triggers	Real ranked text search instead of `LIKE %x%`
`compact.py` script	Manual + `--aggressive` dedupe of redundant observations
New `summarize_project` tool	One-line digest per entity — cheapest possible "what's in here?"

Phase 3 — External research pass (Caveman / RTK / Supermemory)

Researched 3 token-optimization projects in parallel; ported the high-ROI ideas.

Win	Source	What it does
Type-tier ordering	RTK	Decisions/conventions/services kept first when budget trims
`omitted_names` + `expand_with`	RTK	Truncated response says exactly what to `get_entity()` for
Cross-project search via `project="*"`	Supermemory	One call hits all your projects
Type-tier grouping in `summarize_project`	Supermemory	Stable concepts above ephemeral work
Cross-entity dedup (`@dup:<name>` sentinel)	RTK	Repeated obs returned once, referenced thereafter
New `flag_for_summary` tool	Supermemory	Lists entities >N obs without a summary — actionable backlog

Phase 4 — Semantic layer (Supermemory's biggest idea)

Hybrid lexical + semantic search, all local.

Win	Implementation
`embeddings.py` module	Lazy-loaded MiniLM-L6-v2 via fastembed + ONNX, L2-normalized
`observations.embedding BLOB` column	384-dim float32 vector per observation (~1.5 KB each)
Hybrid `search()`	FTS5 BM25 + cosine top-K, fused via Reciprocal Rank Fusion (k=60)
Semantic dedup	`add_observation(dedup_threshold=0.92)` skips paraphrases
`compact.py --semantic 0.92`	Batch semantic dedup across whole DB
New `embedding_status` tool	Diagnostic — model availability + coverage %
`bootstrap_embeddings.py`	One-shot: download 90 MB model + embed all observations
Final coverage	620 / 620 observations embedded

The complete tool surface — 12 tools

recall_context       create_entity         add_observation
create_relation      search                get_entity
list_projects        forget                set_summary
summarize_project    embedding_status      flag_for_summary

Measured token savings (real project data)

Numbers from the actual smoke tests during the build, on the demo project project with 5 entities and 20 observations:

Call type	Bytes returned	vs. legacy verbose
`recall_context(verbose=True)` (legacy)	8,393	—
`recall_context()` lean default	3,378	−60%
`recall_context()` after `set_summary` on largest entity	2,715	−68%
`summarize_project()` triage	395	−95%
`search("rebuild")` FTS5	477	−94%
`recall_context(max_chars=800)`	464	−94% (hard cap honored)

At a glance: characters ÷ 4 ≈ tokens. Old startup recall cost ~2,100 tokens; the new ritual (summarize_project → selective get_entity) costs ~100-400 tokens depending on what's relevant. 5-20× reduction per session start.

Operational benefits — behavioral wins that compound

Before	After
Memory file rewritten end-to-end every session via /memory	Persistent SQLite — only deltas written, never the whole file
Per-project memory configured manually	CWD basename auto-detects project; works in every dir without setup
Memory loaded only when Claude noticed `MEMORY.md`	Auto-loaded via user-scope MCP + CLAUDE.md protocol nudge
Naïve recall returned full payload regardless of project size	Type-tier ordering keeps decisions/conventions; budget caps the rest
Searches missed concept-level queries ("auth" missing `login_handler`)	Hybrid lexical + semantic — finds entities by meaning
Repeated observations bloated context	Cross-entity dedup + semantic dedup at write-time
No way to know what's in memory without paying full cost	`summarize_project()` (~400 chars) + `flag_for_summary()` triage cheaply
Cross-project knowledge invisible from another project	`search(query, project="*")` finds it in one call

Continuous wins

No per-project setup cost. Every new project already has full memory. Zero friction.
Cross-session continuity. State that previously lived in fragile MEMORY.md files now lives in a queryable DB.
Type-aware retrieval. Asking "what conventions apply here?" returns conventions first.
Semantic recall. Don't have to remember exact words. "The thing about animation" finds entities tagged #hero even if "animation" isn't in any observation.
Cheap upkeep. compact.py --aggressive --semantic 0.92 weekly keeps the DB lean.
Privacy. All-local. Model cached. No telemetry. No data leaves the machine.

Cost accounting

Cost	Amount
Disk usage	~320 MB (`.venv` 227 MB + model 90 MB + `graph.db` ~2 MB + log <1 MB)
Per-call CPU	<50 ms for lean recall; <100 ms for semantic search on 620 obs
Network at runtime	Zero
Lock-in	Zero — schema is plain SQLite, inspect anytime with `sqlite3 graph.db`
Bootstrap dependency	One-time ~90 MB download from HuggingFace (qdrant/all-MiniLM-L6-v2-onnx)

Memory Protocol (what Claude is told to do, from `~/.claude/CLAUDE.md`)

Session start ritual

summarize_project() — one line per entity, ~95% cheaper than full recall
flag_for_summary() — for any entity with >5 observations and no summary, call set_summary(name, "<gist>") to short-circuit raw observations on future recalls
recall_context() for the active subset, or get_entity(name) for specific entities

What to write

decision + rationale (entity_type="decision")
file purpose / important state (entity_type="file")
pending task / blocker (entity_type="todo")
user preference / convention (entity_type="convention")
external service / API / dependency (entity_type="service")
person / team member (entity_type="person")

Active-voice relations only: uses, depends_on, blocks, replaces, owns, reports_to, calls, extends.

Keep tokens low

Default lean shape; pass verbose=True only when needed
Read omitted_names from truncated responses; use get_entity() for specifics
@dup:<name> in observations means deduplicated, not missing
Tag stale entities archived for auto-exclusion
Use since_days=N for recent-only context
Use search(query, project="*") for cross-project lookups
Run compact.py --aggressive --semantic 0.92 periodically

Safety

Never store secrets, API keys, passwords, or PII. Reference them by name only (e.g., "uses Stripe API key stored in 1Password as STRIPE_PROD").

Maintenance commands

# Check semantic search status
~/.claude-memory/.venv/bin/python -c "
import sys; sys.path.insert(0,'$HOME/.claude-memory')
import embeddings; print(embeddings.status())"

# Re-embed every observation (after model swap)
~/.claude-memory/.venv/bin/python ~/.claude-memory/bootstrap_embeddings.py --rebuild

# Dry-run dedupe (no writes)
~/.claude-memory/.venv/bin/python ~/.claude-memory/compact.py --dry-run --aggressive --semantic 0.92

# Real dedupe + VACUUM
~/.claude-memory/.venv/bin/python ~/.claude-memory/compact.py --aggressive --semantic 0.92

# Health check
claude mcp list

# Activity log
grep "memory server starting" ~/.claude-memory/server.log

# Errors
grep -E "ERROR|failed|Traceback" ~/.claude-memory/server.log

# Direct DB stats
sqlite3 ~/.claude-memory/graph.db "
  SELECT project, COUNT(*) AS entities FROM entities GROUP BY project;
  SELECT 'total observations: ' || COUNT(*) FROM observations;
  SELECT 'embedded observations: ' || COUNT(*) FROM observations WHERE embedding IS NOT NULL;
  SELECT 'total relations: ' || COUNT(*) FROM relations;
"

What this means in practice

A typical session before this work would either (a) ignore memory and re-ask questions Claude should know the answer to, or (b) load 2-8 KB of memory tokens at session start whether useful or not, possibly missing concept-level matches when searching.

Now: Claude opens the session, calls summarize_project() for ~100 tokens, scans for what's relevant, calls flag_for_summary() to spot bloated entities, then fetches detail only where it matters. Median session-start memory cost: 300-800 tokens, down from 2,000+ tokens. Over 100 sessions a month that's 120K+ tokens saved on memory loading alone — and that ignores the bigger win of finding context lexical search would have missed entirely.

This server cannot be installed

license - permissive license

quality - not tested

maintenance

How are these scores calculated?

Maintenance

–Maintainers

–Response time

–Release cycle

1Releases (12mo)

Commit activity

Resources

GitHub Repository

Need Help?

Related Servers

Unclaimed servers have limited discoverability.

Looking for Admin?

If you are the server author, to access and configure the admin panel.

Latest Blog Posts

Your AI Chatbot Just Exposed Your CEO's Salary to an Intern
By Om-Shree-0709 on July 2, 2026.
Agent Identity
MCP Security
OAuth Delegation
Why MCP Servers Need Execution Sandboxing (And Why Your Current Stack Isn't Enough)
By Om-Shree-0709 on June 30, 2026.
Agentic Ai
Prompt Injection
WebAssembly
Lightport: Open-Sourcing Glama's AI Gateway
By punkpeye on April 27, 2026.
OpenAI
open source

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/VnemAIDev/memora'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

Memora

At a glance

Quick start (for a fresh reader)

File inventory

Build timeline — 4 phases

Phase 1 — Base infrastructure

Phase 2 — First token-optimization pass (7 wins)

Phase 3 — External research pass (Caveman / RTK / Supermemory)

Phase 4 — Semantic layer (Supermemory's biggest idea)

The complete tool surface — 12 tools

Measured token savings (real project data)

Operational benefits — behavioral wins that compound

Continuous wins

Cost accounting

Memory Protocol (what Claude is told to do, from ~/.claude/CLAUDE.md)

Session start ritual

What to write

Keep tokens low

Safety

Maintenance commands

What this means in practice

Maintenance

Resources

Looking for Admin?

Latest Blog Posts

MCP directory API

Memory Protocol (what Claude is told to do, from `~/.claude/CLAUDE.md`)