mem-context
mem-context is an MCP server that provides persistent temporal memory for AI assistants, enabling storing, searching, recalling, and consolidating development history with vector search and multi-factor relevance scoring.
Remember memories (
remember): Store content with automatic embedding, scope detection, and metadata (tags, code diffs, file changes, error traces). Supports three memory types:episodic,semantic, andpermanent.Recall relevant memories (
recall): Query memories using natural language via vector search with multi-factor scoring (vector similarity, weight decay, recency, scope match, access count, type boost). Supports filtering by scope, type, minimum score, and token budget.Get a memory by ID (
get): Retrieve a single memory with all fields.Update memory metadata (
update): Modify tags, weight, review flags, or other fields on an existing memory.Archive a memory (
forget): Soft-delete by setting weight to 0 — excluded from search but retained in the database.Delete a memory (
delete_memory): Soft-delete (weight=0, kept for audit) or hard-delete (permanently removes from database).Bulk delete memories (
purge_memories): Delete multiple memories filtered by scope, type, or age, with dry-run support.Find consolidation candidates (
consolidation_candidates): Identify memories ready for conclusion extraction, merging, archiving, or decay — for the host model to process.Review flagged memories (
review): List memories marked for review (e.g., contradictions,needs_review=True).Check server status (
status): Get store-wide statistics (counts by type, scope, etc.), optionally scoped to a specific project.Export/import memories: Backup, migrate, or sync memories across devices.
mcp-name: io.github.turbyho/mem-context
mem-context — Temporal Memory MCP Server
Multi-modal RAG engine for AI assistants. Stores conversation history, conclusions, diffs, error traces, and other development artifacts in LanceDB with vector search, multi-factor scoring, and an LLM-driven consolidation pipeline.
Why
AI assistants lose context between sessions. mem-context persists what matters — decisions, patterns, bugs, architecture choices — and surfaces them when relevant via vector search. Memories decay over time unless reinforced by repeated access, mimicking human memory.
Related MCP server: alaya
Features
Vector search with dual backend — LanceDB ANN index for fast approximate nearest-neighbor queries. Primary embedding via Ollama
mxbai-embed-large(1024d, ~670 MB). Localall-MiniLM-L6-v2(384d) fallback when Ollama is unavailable — no GPU or network required. Embeddings are auto-padded to match schema dimension; switching backends is transparent.Multi-factor relevance scoring — six independent factors combine into a single 0–1 relevance score. Each factor models a different aspect:
vector_score(semantic similarity),weight_score(stored importance × time decay),recency_score(age in days),scope_score(project match),access_boost(usage reinforcement),type_boost(permanent > semantic > episodic). The model balances "what's relevant" with "what's still valid."Weight decay with natural memory model — each memory type has a configurable
decay_rate: 0.15/day for episodic (session captures fade fast), 0.03/day for semantic (extracted knowledge persists), 0 for permanent (never decays). Decay is exponential:weight × e^(−rate × days). Frequently accessed memories get a counteracting boost — the system reinforces what you use, archives what you don't.Deduplication by cosine similarity — new memories are compared against existing ones before insertion. At similarity > 0.82, the new memory is merged into the existing one (weight boost + content update) instead of creating a duplicate. Prevents memory fragmentation from repeated captures of the same conclusion across sessions.
LLM-driven consolidation pipeline — 3-phase: extract (3 days), merge (7 days), archive (30 days). The server prepares candidates and prompts; the host model (Claude, DeepSeek, GPT, or local Ollama) does the reasoning. Episodic session captures → extracted conclusions (semantic) → merged permanent knowledge → archived if unused. Runs in the background when
remember()orrecall()is called — no cron needed.Multi-modal storage — LanceDB columns for text content, code diffs, file lists, error traces, tags, and metadata. Each modality is indexed separately; vector search operates on the combined embedding. Stores not just "what happened" but the diff and stack trace that caused it.
Automatic conversation capture — hooks for Claude Code (Stop event) and manual capture for OpenCode. The wrapper binary finds the current session's transcript, parses it into structured messages, and imports them as episodic memories. No manual action needed — every session is archived automatically.
Portable export/import — JSON export strips embeddings (re-generated on import), keeps all metadata. Use for backup, cross-device sync, or migrating between machines. Import deduplicates by ID — safe to run multiple times.
One-command provisioning —
mem-context initdetects installed AI tools (Claude Code, OpenCode, Codex, Cursor), registers the MCP server, injects CLAUDE.md instructions, and installs slash-command skills (6 tools:recall,remember,forget,delete,purge,status).mem-context installadds capture hooks. Two commands, ready to use.
Installation
Linux
# 1. System dependencies
sudo pacman -S python3 python-pip # Arch / Manjaro
# nebo
sudo apt install python3 python3-pip python3-venv # Debian / Ubuntu
# nebo
sudo dnf install python3 python3-pip # Fedora
# 2. Install Ollama (for embedding)
curl -fsSL https://ollama.com/install.sh | sh
ollama serve & # Start Ollama in background
# 3. Install mem-context
python3 -m venv ~/.mem-context/.venv
~/.mem-context/.venv/bin/pip install mem-context
# 4. Add to PATH (add to ~/.bashrc or ~/.zshrc)
echo 'export PATH="$HOME/.mem-context/.venv/bin:$PATH"' >> ~/.zshrc
source ~/.zshrc
# 5. Pull embedding model (~670 MB)
ollama pull mxbai-embed-large
# 6. Provision — registers MCP server + injects instructions
mem-context init # all detected AI tools
# or target a single tool:
mem-context init --tool claude-code # Claude Code only
mem-context init --tool opencode # OpenCode only
# 7. Install capture hooks (Claude Code, OpenCode)
mem-context install claude-code
mem-context install opencode # optional
mem-context install status # verify
# 8. Restart your AI assistantmacOS
# 1. System dependencies
brew install python@3.11
# 2. Install Ollama
brew install ollama
# Start Ollama: open Ollama.app or run `ollama serve &`
# 3-8. Same as Linux (steps 3-8 above)
python3 -m venv ~/.mem-context/.venv
~/.mem-context/.venv/bin/pip install mem-context
echo 'export PATH="$HOME/.mem-context/.venv/bin:$PATH"' >> ~/.zshrc
source ~/.zshrc
ollama pull mxbai-embed-large
mem-context init
mem-context install claude-codeVerify installation
# Check CLI works
mem-context status
# Check Ollama + embedding model
mem-context init --check-ollama
# List detected AI tools
mem-context init --list-tools
# Check capture hooks
mem-context install statusManual MCP registration
If mem-context init can't register the MCP server automatically:
Claude Code:
claude mcp add --scope user mem-context ~/.mem-context/.venv/bin/mem-context-mcpOpenCode: Add to ~/.config/opencode/opencode.json:
{
"mcp": {
"mem-context": {
"command": ["$HOME/.mem-context/.venv/bin/mem-context-mcp"],
"enabled": true,
"type": "local"
}
}
}Updating
When a new version is released, update all components:
# 1. Upgrade the package
~/.mem-context/.venv/bin/pip install --upgrade mem-context
# 2. Update instructions, skills, and agents for all detected tools
mem-context init --force
# Target a single tool:
mem-context init --tool claude-code --instructions-only --force
mem-context init --tool opencode --instructions-only --force
# 3. Reinstall capture hooks (picks up new hook types + absolute paths)
mem-context install claude-code
mem-context install opencode
# 4. Verify everything is current
mem-context install status
mem-context init --list-tools
# 5. Restart your AI assistantWhat gets updated:
Component | Command | What |
CLI + MCP server |
| Binary, libraries, entry points |
Instructions |
| CLAUDE.md, rules files, marked sections |
Skills |
| Slash commands (recall, remember, forget, …) |
Agents |
| Background agents (memory-manager) |
Plugins |
| Client plugins (OpenCode |
Capture hooks |
| Hook entries in settings.json / opencode.json |
Usage
MCP tools (from AI assistant)
Tool | Description |
| Store a memory with auto-embedding |
| Vector search with scoring |
| Archive (weight=0) |
| Retrieve one memory |
| Modify metadata |
| Memory store statistics |
| Flagged memories |
| Consolidation tasks for host model |
CLI
mem-context status # Store statistics
mem-context recall "query" --limit 5 # Search memories
mem-context get <id> # One memory
mem-context forget <id> # Archive
mem-context review # Flagged memories
mem-context consolidate --dry-run # Consolidation candidates
mem-context capture transcript <path> # Import conversation
mem-context export -o memories.json # Export all memories
mem-context import memories.json --re-embed # Import from export
mem-context init --list-tools # Show AI tools
mem-context install status # Hook statusHow It Works
Write path: capture → store → embed
Session ends
→ capture hook fires (Claude Code: Stop)
→ transcript parsed into structured messages
→ each message stored as episodic memory
→ content embedded via Ollama (1024d) or local model (384d)
→ cosine similarity check: > 0.82 → merge, else insertRead path: query → embed → search → score → return
recall("how do we handle auth?")
→ query embedded to 1024d vector
→ LanceDB ANN search (scope-filtered: same project + global)
→ raw candidates scored by 6-factor formula
→ sorted by final_score, filtered by min_score
→ token-budgeted: results accumulated until budget exhausted
→ returned to host model for useConsolidation path: age → candidate → LLM → write-back
remember() or recall() called
→ check last_consolidation > interval_hours (24h)?
→ build_task: scan for episodic > 3d, semantic clusters > 7d
→ send prompts + candidates to host model
→ host model extracts conclusions → new semantic memories
→ host model merges similar semantics → permanent
→ low-weight (< 0.1) memories archived (weight = 0)The host model does all reasoning — the server only prepares structured prompts and candidate lists. This means consolidation quality scales with the host model's capability (Fable 5 > Opus > Sonnet > local Ollama).
Architecture
mem-context/src/mem_context/
├── storage/lance.py LanceDB CRUD, ANN search, FTS, export/import
│ schemas.py PyArrow schemas: memories, relations, conversations
├── retrieval/embedder.py Dual-backend embedding (Ollama + local fallback)
│ scoring.py 6-factor scoring: vector × weight × decay × …
├── capture/formats.py Transcript parsers: Claude Code, OpenCode, JSON, generic
│ wrapper.py Hook entry-point: finds transcript, runs capture
├── consolidation/
│ pipeline.py Build tasks, run extract/merge/archive phases
│ templates.py Prompt templates for each consolidation phase
│ ollama.py Local model fallback for LLM tasks
├── mcp/server.py FastMCP server: 10 tools (remember, recall, forget, …)
├── provision.py AI tool detection, CLAUDE.md injection, skill install
├── config.py YAML + env config with auto-detection
└── scope.py Project scope resolution (config → path hash → global)Scoring
final = vector_score × weight_score × recency_score × scope_score × access_boost × type_boost
vector_score = exp(-cosine_distance)
weight_score = sqrt(weight × e^(-decay_rate × days))
recency_score = e^(-recency_decay_rate × days)
recency_decay_rate = permanent: 0.005, semantic: 0.02, episodic: 0.05
scope_score = same_project: 1.0, global: 0.8, other: 0.4
access_boost = min(2.0, 1.0 + 0.1 × access_count)
type_boost = permanent: 2.0, semantic: 1.2, episodic: 1.0Memory types
Type | Default weight | Decay rate | Use |
| 0.5 | 0.15/day | Session captures, debugging |
| 0.7 | 0.03/day | Extracted conclusions, patterns |
| 1.0 | 0.0 | Architecture decisions, conventions |
Consolidation pipeline
Phase | Trigger | Action |
Extract | 3 days | Episodic → host model extracts conclusions → semantic |
Merge | 7 days | Semantic cluster by embedding → host model merges |
Archive | 30 days | weight < 0.1 → weight = 0 |
The server prepares prompts and candidates; the host model (Claude, DeepSeek, GPT) does the reasoning and writes results back via MCP tools.
Automatic background consolidation
No cron needed — consolidation runs automatically in the background
when remember() or recall() is called, at most once per interval_hours
(default 24h).
Configuration
All parameters are configurable via ~/.mem-context/config.yaml,
.mem-context/config.yaml, or environment variables. See
Configuration docs for all options.
# Quick overrides
export MEM_CONTEXT_CONSOLIDATION_MODEL=qwen2.5-coder:14b # model
export MEM_CONTEXT_CONSOLIDATION_TEMPERATURE=0.1 # 0.0-1.0
export MEM_CONTEXT_CONSOLIDATION_TIMEOUT=300 # secondsParameter | Default | Env var | Description |
| auto-detect |
| 14b→7b→3b, or override |
| 8192 |
| Context window tokens |
| 0.2 |
| Determinism (0.0–1.0) |
| 120s |
| Ollama API timeout |
| 3 |
| Episodic → extraction |
| 7 |
| Semantic → merge |
| 30 |
| Low weight → archive |
| 20 |
| Candidates per run |
| 10 |
| Merge groups per run |
| 24 | — | Hours between runs |
Model auto-detection
If no model is configured, the system:
Detects GPU VRAM (NVIDIA, AMD, macOS Metal/Apple Silicon)
Picks the best model that fits:
14b(9+ GB) →7b(5+ GB) →3b(4+ GB)Auto-pulls it via Ollama if not installed
Falls back to smaller model on OOM errors
No GPU: Minimum qwen2.5-coder:3b (~4 GB system RAM, slow on CPU).
MCP path doesn't need a local model — host LLM does the work.
Scope detection
1. .mem-context/config.yaml → project_id → scope = "proj:" + hash
2. Fallback → scope = "path:" + hash(cwd)
3. `scope="global"` is explicit-only — never auto-detectedRequirements
Python 3.11+
Ollama (for embedding) —
mxbai-embed-large(~670 MB, recommended)Or:
sentence-transformerslocal fallback (all-MiniLM-L6-v2, 384d)Consolidation model: auto-detected and auto-installed (see above)
Installation options
mem-context init — instructions + skills (all 5 tools)
mem-context init # All detected AI tools
mem-context init --tool claude-code # Claude Code only
mem-context init --tool opencode # OpenCode only
mem-context init --tool codex # Codex only
mem-context init --tool cursor # Cursor only (project-scoped)
mem-context init --dry-run # Preview without changes
mem-context init --list-tools # Show what's detectedmem-context install — capture hooks (2 tools)
mem-context install claude-code # Stop hook → settings.local.json
mem-context install opencode # MCP server registration → opencode.json
mem-context install status # Check all
mem-context install uninstall -c claude-code # RemoveManual MCP registration
claude mcp add --scope user mem-context ~/.mem-context/.venv/bin/mem-context-mcpDocumentation
Document | Content |
Detailed setup, Ollama, config | |
Všechny parametry s vysvětlením | |
Tool reference with schemas and examples | |
Storage, scoring, retrieval pipeline | |
Pipeline phases, host model workflow | |
| |
Automatic transcript capture setup | |
28 sections, 100+ test cases |
Development
git clone ssh://git@git.montyho.com/turbyho/mem-context.git
cd mem-context
python3 -m venv .venv && source .venv/bin/activate
pip install -e ".[dev]"
python3 -m pytest tests/ -q # 113 testsLicense
MIT
Maintenance
Latest Blog Posts
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/turbyho/mem-context'
If you have feedback or need assistance with the MCP directory API, please join our Discord server