mneme
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@mnemesave that I use TypeScript for all new projects"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
mneme
Save 80-90% memory-related token costs. Persistent long-term memory for AI agents via MCP — on-demand recall instead of always-inject.
Works with any MCP-compatible agent: Claude Code, Cursor, Windsurf, Cline, Continue, and more.
The Problem: Memory Costs Tokens
AI agents are stateless. The common fix is injecting a context file on every prompt — but that means you pay token costs on every single message, even when the agent already knows the answer.
How much does this waste?
Approach | Token cost per message | 100 messages/day |
Pre-injection (always inject) | ~2,000-5,000 tokens | 200K-500K tokens/day |
mneme (on-demand) | 0 tokens (most messages) | ~20K-50K tokens/day |
Most prompts don't need historical memory. mneme lets the agent decide when to look things up — saving 80-90% of memory-related token costs.
Related MCP server: Mnemoverse Memory
What's New in v2.0
Memory Transfer Learning
Inspired by research on cross-context memory reuse (arxiv 2604.14004), memories now have 3 abstraction tiers:
Level | Recall Weight | Description | Example |
| 1.3x | Patterns, heuristics, reusable principles | "When X happens, do Y" |
| 1.0x | Semi-abstract with some context (default) | "Project X uses approach Y because Z" |
| 0.7x | Specific operation logs | "On 04-16, ran migration script" |
Key insight: Concrete traces have low cross-context reuse value and can cause negative transfer. The system automatically weights meta-knowledge higher during recall, so distilled patterns surface above raw event logs.
sqlite-vec Hybrid Search (FTS5 + KNN + RRF)
When configured with an embedding API, mneme now runs dual-path retrieval:
FTS5 path: Keyword/lexical matching (fast, exact)
Vector path: Semantic matching via sqlite-vec KNN (synonyms, paraphrases)
RRF fusion: Reciprocal Rank Fusion merges both result sets fairly using only rank positions (no scale normalization needed)
Falls back gracefully to FTS5-only when sqlite-vec or embedding API is not configured.
Performance: ~150ms total (FTS5 <10ms + one embedding API call ~120ms). sqlite-vec KNN is sub-millisecond locally.
Compression Pipeline
Old conversation segments can be automatically compressed into summary memories:
Uses a fast LLM (e.g., Claude Haiku) for summarization
Tracks
compressed_fromsource rowids for traceabilityAnti-cascade protection: compressed memories cannot be re-compressed (prevents hallucination amplification)
Triggers: CLI command, hooks, or manual invocation
Note: In practice, we find that ingesting compact summaries from Claude Code's built-in /compact feature (via the SessionStart hook) is simpler and more effective than running a separate compression pipeline. Both approaches are supported.
Compact Summary Ingestion
mneme can ingest summaries from Claude Code's /compact feature:
# Triggered by SessionStart hook when source=compact
TOKENMEM_COMPACT_SUMMARY="..." TOKENMEM_COMPACT_SESSION="session-id" \
node index.mjs --store-compact-summaryThis captures session knowledge automatically when Claude Code compacts context, creating a durable long-term memory from what would otherwise be lost.
Breaking Changes
buildMemoryContext()is now async (returnsPromise<string>)storeMemoryAsync()now writes to the sqlite-vec virtual table when availableNew
memory_levelparameter in MCPstore_memorytoolDB path configurable via
TOKENMEM_DB_PATHenvironment variable
What's New in v2.1 (Memory Hygiene)
Three mechanisms borrowed from memory-decay literature, adapted to the memory health layer only — no prompt injection, no mood state machines, no persona modeling. The goal is "make memory ranking realistic over time", not "give the AI feelings".
Power-Law Decay
Every memory now has a decay_score that updates periodically based on age, importance, and reuse:
w(t) = (1 + t / τ)^(-b_eff) τ = 24h, b_base = 0.7
b_eff = b_base / (1 + importance / 10)
decay = min(1.0, w × (1 + min(10, access_count) × 0.3))High-importance + frequently-recalled records stay near 1.0 (reuse boost saves them)
Low-importance + untouched records decay to ~0.2 over a few weeks — but never disappear. They still get queried, just rank lower.
Run via runDecayCycle() from a maintenance daemon's interval, alongside expireMemories() / promoteMemories(). CLI: there is no separate script — call from your own daemon or setInterval.
Recall scoring (both FTS and hybrid paths) now multiplies by decay_score, so naturally-fresh records bubble up without manual TTL tuning. Records that haven't been through a cycle default to 1.0 (backward-compatible).
Surfaced Random Recall ("I Just Remembered")
When recall_memory returns fewer records than requested, there's a 25% chance of pulling 1-3 records from the cold pool:
importance >= 8(genuinely valuable, not noise)Last accessed > 30 days ago (truly cold)
decay_score >= 0.3(not utterly buried)
Surfaced records carry recall_source: 'surfaced_random' so callers can distinguish them from query matches. buildMemoryContext() marks them with [surfaced] in the output.
This counters the "long tail of high-value memories that decay below the top of normal ranking" problem — useful patterns from months ago can resurface unprompted, modeling the "I just remembered" feeling.
Supersede Paper Trail
When store_memory is called with a supersedes array (rowid strings of old records), the new record now:
Inherits the old records'
prior_versions[](chained absorption — full history preserved across multiple supersede generations: v1 → v2 → v3 keeps the v1 content too)Pushes the old records'
content/summary/created_atinto its ownprior_versions[]Updates old records'
superseded_bypointer (existing soft-link mechanism preserved)expireMemories()soft-deletes the old chain on its next pass
Recall returns only the latest content. prior_versions[] (stored as JSON) is queryable for audit / root-cause / "what did I previously think?" analysis. No history loss when retracting.
Migrations Directory
Schema changes are now versioned in migrations/:
migrations/
├── 001-add-superseded-by.sql # supersede pointer column (paper trail prerequisite)
├── 003-add-decay-and-priors.sql # decay_score + prior_versions + cold-pool index
└── 004-add-dedup-and-event-time.sql # content_hash dedup + event_time (v2.2)Apply in order against an existing tokenmem.db for auditing. Fresh installs don't need to run these by hand — initMemory() applies the column additions inline (idempotent ALTER TABLE in try/catch). The schema is forward-compatible — pre-migration records get default values (decay_score = 1.0, prior_versions = '[]') so existing recall calls keep working.
Stronger Database Backup Protection
.gitignore now covers *.db.bak / *.db.bak-* / *.db.bak.* patterns — previous versions only blocked *.db.backup-* which let date-suffixed backups slip through accidentally.
What's New in v2.2
HTTP Streamable Transport (single shared daemon)
In addition to the default stdio transport (one server process per client), mneme can now run as a single long-lived HTTP server shared by all clients:
node mcp-server.mjs --transport=http --port=18792Why: when N agent sessions each spawn their own stdio mcp-server process, they contend on the same SQLite WAL and can pile up into zombie processes. One daemon-managed HTTP instance with a single SQLite connection roots that out. Exposes GET /health (returns embeddingConfigured + vectorCoverage so a supervisor can detect a silently-degraded vector path).
Store-Time Dedup + event_time
content_hashdedup: a 5-minute window stops agents that retry-store the same content from bloating the table — the existing row'saccess_countis bumped instead, preserving the "told you already" signal.event_time: when the event actually happened, distinct fromcreated_at(when it was recorded) — lets recall do temporal reasoning ("what did I do last June?") even for memories recorded later.
recall_by_id
Fetch exact memories by rowid (CLI + MCP tool), without bumping access_count — for citation / audit / "show me memory #N" without polluting the recall-frequency signal.
How It Works
┌────────────────────────────────────────────────┐
│ Any MCP-Compatible Agent │
│ (Claude Code / Cursor / Windsurf / ...) │
│ │
│ User prompt → "Do I already know this?" │
│ │ │
│ ┌──────┴──────┐ │
│ ↓ Yes ↓ No │
│ Answer directly recall_memory() │
│ (0 extra tokens) ↓ │
│ MCP Server │
│ ↓ │
│ FTS5 + sqlite-vec KNN │
│ + RRF fusion scoring │
│ (tokenmem.db) │
│ ↓ │
│ ← ranked results │
│ │
│ store_memory("important fact", │
│ level: "meta_knowledge") → MCP Server │
│ ↓ │
│ INSERT + embedding → vec │
└────────────────────────────────────────────────┘MCP tools exposed:
Tool | Purpose |
| Hybrid search: FTS5 + vector KNN + RRF fusion scoring |
| Store with abstraction level (meta_knowledge / semi_abstract / concrete_trace) |
| Fetch exact memories by rowid (no access_count bump) — citation / audit |
| Stats including compression pressure, dead knowledge, search miss rate, vector coverage |
Why MCP Makes This Universal
mneme is a standard MCP server, supporting both stdio (default, one process per client) and HTTP Streamable transport (--transport=http, a single shared daemon). Any AI agent or IDE that supports the Model Context Protocol can connect to it — no code changes needed.
Tested with:
Agent | Setup |
Claude Code |
|
Cursor | Add to |
Windsurf | Add to MCP server config |
Cline / Continue | Add to MCP settings |
Features
Memory Layers with Auto-Promotion
Layer | TTL | Auto-promotes when |
| 6 hours | Accessed 3+ times or importance >= 7 |
| 7 days | Accessed 8+ times or importance >= 8 |
| No expiry | — |
| No expiry, no deletion | — |
Composite Scoring (AIRI-inspired)
score = FTS_relevance (40%) + importance (30%) + recency (20%) + access_frequency (10%)With Memory Transfer Learning overlay:
final_score = base_score × level_weight × decay_score
where level_weight = { meta_knowledge: 1.3, semi_abstract: 1.0, concrete_trace: 0.7 }
decay_score = power-law decay × reuse boost (v2.1, defaults to 1.0)In hybrid mode (FTS5 + vector):
score = (RRF_score × 0.7 + importance × 0.2 + recency × 0.1) × level_weight × decay_scoreThe × decay_score multiplier (v2.1) lets long-untouched records rank lower naturally, without manual TTL tuning. See What's New in v2.1 above.
9 Memory Categories
general · people · project · decision · feedback · bug · relationship · skill · preference
Chinese Tokenization (Optional)
Built-in support for Chinese via wangfenjin/simple — a native SQLite extension using cppjieba for word-level segmentation. Falls back gracefully to character-level FTS5 if the extension isn't installed.
Non-Chinese users: skip this entirely. The default FTS5 tokenizer works well for English and other languages.
Health Metrics
memory_stats() now reports:
Compression pressure: ratio of temporary to permanent memories (>1.0 = piling up)
Dead knowledge: long-term memories not accessed in 30 days
Search miss rate: queries that returned zero results (knowledge blind spots)
Quick Start
Prerequisites
Node.js 18+
Any MCP-compatible AI agent
Optional Native Extensions
For enhanced functionality, you can add these SQLite extensions (place in lib/ directory):
sqlite-vec: KNN vector search for hybrid retrieval
wangfenjin/simple: Chinese word-level tokenization
Both are optional — mneme works fully with just FTS5 out of the box.
Install
git clone https://github.com/MXAntian/mneme.git
cd mneme
npm installConfigure Embeddings (Optional)
For hybrid search (FTS5 + vector), set these environment variables:
export EMBEDDING_API_BASE_URL="https://api.openai.com/v1" # or any OpenAI-compatible API
export EMBEDDING_API_KEY="your-key"
export EMBEDDING_MODEL="text-embedding-3-small" # default
export EMBEDDING_DIMENSION="1536" # defaultYou can also put these in a .env.local file in the project root.
Initialize
node index.mjs --stats
# Creates tokenmem.db on first runConnect to Your Agent
Claude Code:
claude mcp add --scope user mneme -- node /absolute/path/to/mcp-server.mjsCursor / Windsurf / Other MCP clients:
{
"mcpServers": {
"mneme": {
"command": "node",
"args": ["/absolute/path/to/mcp-server.mjs"]
}
}
}Add Agent Instructions
Add to your agent's system instructions (e.g., CLAUDE.md, .cursorrules, etc.):
## Memory System (mneme MCP)
You have access to a persistent memory database via the `mneme` MCP server:
- `recall_memory(query, limit?, category?)` — retrieve relevant memories
- `store_memory(content, summary?, importance?, memory_type?, memory_level?, category?, tags?)` — store important info
- `memory_stats()` — view statistics
### When to call recall_memory
**Check context first. Only query when context doesn't contain a confident answer.**
Must call:
- User asks about personal preferences, habits, past work
- User references people, relationships, project history
- Context doesn't have a confident answer
Skip:
- Current context already has the answer
- Pure technical question unrelated to stored knowledge
- Already queried the same topic in this session
### Memory Level Guidelines
When storing memories, prefer higher abstraction levels:
- `meta_knowledge` (preferred): Patterns, principles, heuristics — "When X happens, do Y"
- `semi_abstract` (default): Description with some context — "Project uses X because Y"
- `concrete_trace` (last resort): Specific operation logs — "Ran script X on date Y"
Distill experiences into reusable patterns whenever possible.CLI Usage
mneme also works as a standalone CLI tool — useful for hooks, scripts, and debugging:
# Check stats
node index.mjs --stats
# Recall memories
node index.mjs --recall "food preferences" --limit 5
# Store a memory with abstraction level
node index.mjs --store "When encountering X, always check Y first" \
--importance 8 --type long_term --category skill \
--level meta_knowledge
# Build context for injection (useful in hooks)
node index.mjs --context "current project status"
# Compress old conversations (requires claude CLI)
node index.mjs --compress <chat_id> --days 30
node index.mjs --compress-all
# Ingest compact summary (called by SessionStart hook)
TOKENMEM_COMPACT_SUMMARY="..." node index.mjs --store-compact-summary
# Backfill embeddings for existing memories
node backfill-embeddings.mjs --concurrency 3
node backfill-embeddings.mjs --dry-run # count onlyUtilities
backfill-embeddings.mjs
Batch-generates embedding vectors for existing memories that don't have them yet. Useful when first enabling vector search on an existing database.
migrate-claude-memories.mjs
Imports Claude Code's auto-memory .md files (~/.claude/projects/*/memory/*.md) into the SQLite database. Idempotent — safe to re-run. Does not delete original files.
File Structure
mneme/
├── mcp-server.mjs # MCP server entry point (stdio transport)
├── index.mjs # Core engine: store, recall, hybrid search, compression, decay
├── schema.sql # SQLite schema (memories, conversations, FTS5, goals)
├── migrations/ # Versioned schema migrations (apply in order)
│ ├── 001-add-superseded-by.sql
│ └── 003-add-decay-and-priors.sql
├── package.json # 3 dependencies only
├── backfill-embeddings.mjs # Batch embedding backfill script
├── migrate-claude-memories.mjs # Claude auto-memory migration tool
├── tokenmem.db # SQLite database (auto-created, gitignored)
└── lib/ # Optional: native extension binaries (gitignored)
├── libsimple-windows-x64/ # Chinese tokenizer (wangfenjin/simple)
└── sqlite-vec-windows-x64/ # Vector search (asg017/sqlite-vec)~1,800 lines of code. 3 dependencies. No build step.
Design Decisions
Why SQLite, not a vector database?
For personal agent memory, FTS5 + sqlite-vec provides sufficient semantic recall without operational overhead. The hybrid approach (FTS5 for exact matching + sqlite-vec for semantic) covers both query styles.
Why on-demand, not pre-injection?
Pre-injection wastes tokens on every message. On-demand lets the agent skip the lookup when it already has the answer — which is most of the time.
Why MCP, not a custom API?
MCP is the emerging standard for agent-tool communication. One implementation works across Claude Code, Cursor, Windsurf, and any future MCP-compatible agent.
Why Memory Transfer Learning?
Research shows that concrete execution traces transfer poorly across contexts and can even cause negative transfer. By automatically weighting meta-knowledge higher during recall, the system surfaces reusable patterns over raw event logs.
Why RRF for hybrid search?
Reciprocal Rank Fusion uses only rank positions, not raw scores. This means FTS5 BM25 scores and vector distances — which have completely different scales — can be merged fairly without normalization.
Environment Variables
Variable | Default | Description |
|
| Path to SQLite database |
| — | OpenAI-compatible embedding API base URL |
| — | API key for embedding service |
|
| Embedding model name |
|
| Vector dimension |
|
| Path to Claude CLI (for compression pipeline) |
| — | Compact summary text (for SessionStart hook) |
| — | Session ID for compact summary |
References
moeru-ai/airi — Memory architecture inspiration (composite scoring model)
wangfenjin/simple — Chinese tokenizer for SQLite FTS5 (cppjieba-based)
asg017/sqlite-vec — SQLite vector search extension
SQLite FTS5 — Full-text search extension with BM25 ranking
Model Context Protocol — The standard for agent-tool communication
Memory Transfer Learning (arxiv 2604.14004) — Cross-context memory reuse research
License
MIT
This server cannot be installed
Maintenance
Resources
Unclaimed servers have limited discoverability.
Looking for Admin?
If you are the server author, to access and configure the admin panel.
Latest Blog Posts
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/AgentGameLab/mneme'
If you have feedback or need assistance with the MCP directory API, please join our Discord server