history-rag
Allows searching over Zsh command history, with deduplication and filtering of trivial commands.
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@history-ragsearch for the docker compose command I used last week"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
Claude Code history RAG
Local semantic search over your history — Claude Code sessions and shell command
history — exposed to Claude Code as an MCP search_history tool. Everything is
indexed into one vector space, so a single query ranks chat turns and terminal
commands together. Runs entirely on your machine; nothing leaves it.
Setting this up by handing it to your coding agent? Point it at
AGENT_SETUP.mdinstead — that's the agent runbook. This README is the human walkthrough.
Quickstart
For the impatient (full detail in the numbered sections below):
git clone https://github.com/standingwave/history-rag.git && cd history-rag
brew install ollama && brew services start ollama
ollama pull nomic-embed-text
uv venv ~/.claude/rag-venv
uv pip install --python ~/.claude/rag-venv/bin/python -r requirements.txt
~/.claude/rag-venv/bin/python index.py # build the index
claude mcp add history -- ~/.claude/rag-venv/bin/python "$(pwd)/server.py"Related MCP server: lore
Layout
config.py— shared settings (model, dimensions, DB path, Ollama URL), each overridable by env var. Imported everywhere so build and query always agree.index.py— driver: pulls chunks from every source inSOURCES, embeds via Ollama, writes~/.claude/history-rag.db.sources/— one module per content source, each yielding(id, text, record):claude.py— Claude Code session prompts + assistant text.shell.py— bash + zsh command history, deduped.
server.py— the MCP server exposingsearch_history.inspect_sessions.py— one-off: dumps the JSONL shape so you can confirm the Claude parser matches your session files.
Sources
Every source feeds one shared index; pass source="claude" or source="shell"
to search_history to restrict a query.
Shell history reads ~/.zsh_history, ~/.bash_history, and the per-session
snapshots macOS keeps in ~/.zsh_sessions/ and ~/.bash_sessions/. Live history
files are capped by your shell's SAVEHIST/HISTSIZE, but the session snapshots
reach further back. For history archived elsewhere (old machines, backups), point
CLAUDE_RAG_HISTFILES at the extra files (colon-separated):
CLAUDE_RAG_HISTFILES="$HOME/backups/zsh_history.2019:$HOME/backups/bash_history.old" \
~/.claude/rag-venv/bin/python index.pyIdentical commands collapse to one entry (with a run count); trivial commands
(ls, cd, …) are dropped, and anything that looks like it contains a secret
(passwords, tokens, API keys, user:pass@host URLs) is skipped so it's never
embedded. Command timestamps only appear if zsh recorded them (setopt EXTENDED_HISTORY); bash needs HISTTIMEFORMAT set.
Adding a source: drop a module in sources/ with an iter_chunks()
generator that yields (id, text, {"source", "timestamp", "location", "meta"}),
then add it to SOURCES in index.py. The id must be stable across runs so
indexing stays incremental.
1. Prereqs
Install Ollama
macOS (Homebrew, gives easy updates):
brew install ollama
brew services start ollama # runs the daemon in the backgroundOr download the .dmg from https://ollama.com/download/mac and drag to Applications (launch it once so the menu-bar daemon starts).
Linux:
curl -fsSL https://ollama.com/install.sh | sh # sets up a systemd serviceVerify the daemon is up (the indexer/server talk to it on port 11434):
ollama --version
curl http://localhost:11434/api/tags # should return JSON, not connection refusedPull the embedding model + Python deps
ollama pull nomic-embed-text # 768-dim, fastUsing uv (recommended):
uv venv ~/.claude/rag-venv
uv pip install --python ~/.claude/rag-venv/bin/python -r requirements.txt(requirements.txt is just sqlite-vec, requests, mcp[cli].)
uv resolves to prebuilt wheels, avoiding the Rust/maturin source builds that
break on Apple Silicon. Run index.py and register server.py with this venv's
interpreter: ~/.claude/rag-venv/bin/python.
Don't have pip and not using uv? First get Python (it bundles pip). On macOS:
brew install python # installs python3 + pip3
python3 -m pip --version # verifyThen install the deps (use pip3, or python3 -m pip if pip isn't on PATH):
python3 -m pip install -r requirements.txtIf brew install python warns about an "externally-managed environment" when
installing the deps, use a venv instead:
python3 -m venv ~/.claude/rag-venv
source ~/.claude/rag-venv/bin/activate
pip install -r requirements.txtIf you use a venv, run index.py and register server.py with that venv's
python: ~/.claude/rag-venv/bin/python.
2. Inspect (do this first)
Confirms the JSONL field names match the parser. If your output shows
different keys (e.g. content not under message.content), tweak
_text_from_content / iter_chunks in sources/claude.py.
~/.claude/rag-venv/bin/python inspect_sessions.py3. Build the index
Use the venv interpreter you installed deps into (bare python won't see them).
First preview what survives the filter across all sources (Claude keeps real prompts + assistant text, dropping tool calls/results/thinking/meta; shell keeps deduped non-trivial commands):
~/.claude/rag-venv/bin/python index.py --dry-runIf that looks right, build:
~/.claude/rag-venv/bin/python index.py # incremental (safe to re-run)
~/.claude/rag-venv/bin/python index.py --rebuild # wipe + reindex from scratchWrites ~/.claude/history-rag.db. Use --rebuild after changing the embedding
model or the chunk schema (e.g. adding a source) — the table layout changes, so
an incremental run against an old DB won't work.
4. Register the MCP server with Claude Code
Run this from the repo directory, using the venv interpreter (bare python
won't find the deps). $(pwd) fills in the absolute path to server.py (the
registration needs an absolute path, not a relative one):
claude mcp add history -- ~/.claude/rag-venv/bin/python "$(pwd)/server.py"Confirm it registered:
claude mcp list # 'history' should appearThen in a session, Claude can call search_history("that proxy bug we hit", k=5).
5. Keep it fresh
The index only reflects sessions present at last run. Pick one:
cron — edit your crontab with crontab -e, then add a line. cron has a
minimal PATH and no ~ expansion, so use absolute paths. Get yours with
echo "$HOME/.claude/rag-venv/bin/python $(pwd)/index.py" and paste the result:
# refresh the history index every 30 min; log output for debugging
*/30 * * * * /ABS/PATH/rag-venv/bin/python /ABS/PATH/index.py >> $HOME/.claude/rag-index.log 2>&1Note: cron needs the Ollama server running to embed new chunks. After saving, check it fired by tailing the log:
tail -f ~/.claude/rag-index.logOn macOS, cron may need Full Disk Access (System Settings → Privacy & Security →
Full Disk Access → add /usr/sbin/cron) to read ~/.claude.
manual — run when you want it current:
~/.claude/rag-venv/bin/python index.pyfile-watcher — watch ~/.claude/projects/**/*.jsonl and trigger index.py
on change (e.g. with fswatch or a launchd WatchPaths agent).
6. Verify it works inside a Claude Code session
After registering (step 4) and indexing (step 3):
Confirm the server is connected. In a session, run the MCP status command:
/mcpYou should see
historylisted as connected with asearch_historytool.Ask Claude something that needs your history. Natural-language prompts that force a lookup work best — Claude will call the tool on its own:
Search my past sessions: what did we decide about the sqlite-vec schema? What have I worked on involving Ollama and embeddings? What's that ffmpeg command I used to convert a webm? (search my shell history)Claude should invoke
search_historyand cite matched snippets with their source / timestamp / location.Call the tool explicitly if you want to test it directly:
Use the search_history tool with query "Attio CRM setup" and k=5Sanity-check the raw DB (outside Claude Code) to confirm rows exist:
~/.claude/rag-venv/bin/python - <<'PY' import sqlite3, sqlite_vec, os db = sqlite3.connect(os.path.expanduser("~/.claude/history-rag.db")) db.enable_load_extension(True); sqlite_vec.load(db) print("chunks:", db.execute("SELECT COUNT(*) FROM chunks").fetchone()[0]) for row in db.execute("SELECT source, COUNT(*) FROM chunks GROUP BY source"): print(row) for row in db.execute("SELECT source, timestamp, substr(text,1,70) FROM chunks LIMIT 5"): print(row) PY
Troubleshooting:
/mcpdoesn't listhistory→ re-checkclaude mcp list; the path to server.py must be absolute and the interpreter must be the venv's.Tool errors with a connection error → Ollama isn't running (the server embeds your query at call time). Start it:
open -a Ollama.Tool returns nothing → the index is empty or stale; re-run index.py.
Notes
One chunk per Claude message and per unique shell command. For long assistant turns you may later want sliding-window chunking; per-message is fine to start.
nomic-embed-textis the speed pick. For higher quality, set the env vars (in both your indexing shell and the MCP registration) and re-index:ollama pull mxbai-embed-large CLAUDE_RAG_MODEL=mxbai-embed-large CLAUDE_RAG_DIM=1024 ~/.claude/rag-venv/bin/python index.py --rebuildOther overrides:
CLAUDE_RAG_DB,CLAUDE_RAG_OLLAMA. Seeconfig.py.Distance is cosine-ish; lower = closer.
This server cannot be installed
Maintenance
Resources
Unclaimed servers have limited discoverability.
Looking for Admin?
If you are the server author, to access and configure the admin panel.
Latest Blog Posts
- Why MCP Servers Need Execution Sandboxing (And Why Your Current Stack Isn't Enough)By Om-Shree-0709 on .Agentic AiPrompt InjectionWebAssembly
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/standingwave/history-rag'
If you have feedback or need assistance with the MCP directory API, please join our Discord server