localthink-mcp
Allows offloading large file queries, document processing, and context compression to local Ollama models, reducing Claude's context window usage.
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@localthink-mcpwhat does the parseConfig function do in src/config.ts?"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
localthink-mcp
Local LLM context compression for Claude Code. Offloads large file queries and document processing to Ollama so they never burn Claude's context window.
v0.1.0 benchmarked at ~30× token savings on 16 KB file queries. v1.1 adds 13 new tools covering every major token-waste pattern. v1.2 adds pre-injection:
local_improve_promptandlocal_preplanrun locally before Claude sees the task — sharpening prompts and scaffolding plans so Claude executes rather than guesses. v2.1 adds smart buffer, execution filters, session scratchpad, persistent notes, response refinement, and a disk-backed result cache — 14 new tools, 45 total. v2.2 adds the tiered CLAUDE.md system — switch between Full/Half/Quarter instruction sets with one command, replacing the old 102-line monolith with 12–55 lines depending on tier. v2.3 adds local_suggest (intelligent tool picker), local_explain_error (one-call debugging), local_git_diff (git-aware semantic diff), and local_session_recall (auto-surface notes at session start) — 4 new tools, 49 total. Includes thread-safe caching, Ollama error handling, and full doc consistency pass.
Quick start
# 1. Pull models for your hardware (example: 10-12 GB VRAM — Tier E)
ollama pull qwen2.5:14b-instruct-q4_K_M # MAIN — deep ops
ollama pull qwen2.5:7b-instruct-q4_K_M # FAST — lightweight ops
ollama pull qwen2.5:3b # TINY — instant/gate ops
# 2. Register with Claude Code — models set inline, no config editing
claude mcp add localthink \
--env OLLAMA_MODEL="qwen2.5:14b-instruct-q4_K_M" \
--env OLLAMA_FAST_MODEL="qwen2.5:7b-instruct-q4_K_M" \
--env OLLAMA_TINY_MODEL="qwen2.5:3b" \
-- uvx localthink-mcp
# Windows:
# claude mcp add --transport stdio localthink ^
# --env OLLAMA_MODEL="qwen2.5:14b-instruct-q4_K_M" ^
# --env OLLAMA_FAST_MODEL="qwen2.5:7b-instruct-q4_K_M" ^
# --env OLLAMA_TINY_MODEL="qwen2.5:3b" ^
# -- cmd /c uvx localthink-mcp
# 3. Set your CLAUDE.md instruction tier
cp -r claude-md/ ~/.claude/localthink/
python ~/.claude/localthink/set-tier.py full
# 4. Verify
claude mcp list # localthink → ConnectedSee SETUP.md for per-hardware pull commands across all tiers (CPU to 48 GB+ GPU). Fine-tune any setting live with local_config — no file editing.
Tiered CLAUDE.md Instructions
Instead of pasting a 102-line monolith into CLAUDE.md, pick a tier:
Tier | Lines in CLAUDE.md | Tools | Best for |
| ~60 | All 49 | Complex projects, new codebases, research-heavy sessions |
| ~35 | ~22 | Day-to-day dev: file nav + CI filters |
| ~15 | ~7 | Minimal — just stop Claude loading big files |
python ~/.claude/localthink/set-tier.py full # switch to full
python ~/.claude/localthink/set-tier.py half # switch to half
python ~/.claude/localthink/set-tier.py # show current tierSee claude-md/ and CLAUDE_MD_TEMPLATE.md for full documentation.
Requirements
Ollama installed and running (
ollama serve)Claude Code CLI
Python 3.10+
All 49 tools
v0.1.0 — Core compression
Tool | When to use |
| Query a large file without loading it into context |
| Compress a large text blob already in context |
| Pull only the cited passages you need from a document |
v1.1 — New routes
File operations
Tool | What it does |
| Read a file → return compressed content (not an answer). Hold the compressed version in context for repeated reference. |
| Answer one question across many files in a single call. No files enter Claude's context. |
| Walk a directory, summarize or query every matching file. Glob pattern support ( |
Composition (fewer round-trips)
Tool | What it does |
| Chain |
| Meta-tool: detects file path vs text, picks the right op, handles large docs with auto extract-then-answer. Zero decision overhead. |
Stateful document chat
Tool | What it does |
| Multi-turn Q&A. Document is compressed on first call and stays with Ollama. Claude holds only conversation history — the original doc never enters Claude's window. |
Semantic & structural
Tool | What it does |
| Find passages matching a concept, not a literal string. "Find where rate limiting is enforced" works even if the word "rate" isn't there. |
| Structural table of contents with line ranges — no content returned. Use before |
| Public API skeleton. Python: pure AST (no Ollama, instant). Other languages: fast LLM. Typically 5-10% of original size. |
Analysis / meta
Tool | What it does |
| Classify content type + recommend the best tool. Returns JSON. Use for programmatic routing in hooks/scripts. |
| Checklist-based file audit: PASS / FAIL / PARTIAL / N/A per item. File never enters Claude's context. |
| List local Ollama models and show current DEFAULT / FAST model config. |
v1.2 — Pre-injection (run before Claude thinks)
These tools run a local model pass before Claude engages with a task. Claude never sees the raw input — only the pre-processed output. Eliminates waste at the source rather than compressing after the fact.
Tool | What it does |
| Rewrite a vague or rough prompt into a clear, specific, unambiguous version. Claude receives only the sharpened result. Uses the fast model — minimal overhead. |
| Generate a structured implementation plan (goal / assumptions / ordered steps / risks / open questions) via local model. Claude executes the scaffold rather than planning from scratch. |
local_improve_prompt example:
"make the auth faster"
→ local_improve_prompt(prompt, context="Next.js, JWT, DB bottleneck suspected")
→ "Optimise JWT validation latency in src/auth/middleware.ts — profile the verify()
hot path, remove redundant DB round-trips, target p95 < 5 ms."
→ Feed that to Claude as the actual tasklocal_preplan example:
plan = local_preplan(
task="add rate limiting to the API",
context="Express.js, Redis available, routes in src/routes/",
depth="standard"
)
# Returns: Goal / Assumptions / Steps with file paths / Risks / Open questions
# Then: "Execute this plan: <plan>"v1.1 expansion — high-context compression + smart reading
High-context compression
Tool | What it does |
| Compress a log file to its essential signal. Groups repeated errors with counts, extracts key events, surfaces anomalies. Optional level ( |
| Distil a stack trace (+ source context) to: root cause, failure point, 3-5 key frames, fix hint. Eliminates framework boilerplate that inflates traces to thousands of tokens. |
| Compress JSON objects, CSV exports, and API responses. Strips nulls, samples large arrays, keeps IDs/status codes. REST responses commonly shrink 20:1. |
| Recursive meta-tool. Compress a saved Claude conversation transcript to a re-entry briefing: context, decisions, current state, open items, constraints. The transcript never enters Claude's context. |
| Compress a long CLAUDE.md or system prompt to its minimal directive set. Preserves every unique rule; removes duplicates and verbose prose. |
Smart reading (avoid loading files at all)
Tool | What it does |
| Full symbol table: every definition with type, line number, and one-line description. Replaces "read file to see what's in it." |
| Natural-language code search inside a file. Returns the complete matching logical unit with line numbers. E.g. |
| All function bodies → |
Format transformation
Tool | What it does |
| Convert formats without loading source into context: |
| Sample data → compact JSON Schema (draft-07). API samples are often 100:1 data-to-schema ratio. |
Temporal & multi-file diff
Tool | What it does |
| Chronological event sequence from logs, changelogs, git log, or incident reports. Deduplicates repeated events. |
| Diff two files by path — neither file loaded into context. Counterpart to |
v2.3 — Diagnostics, git integration, session intelligence
Tool | What it does |
| Returns an ordered call plan for any task — eliminates reasoning over 49 tool descriptions. Fast model; cached by task+files hash. |
| Root-causes an exception, shows the relevant code snippet, and suggests a fix. Auto-detects the implicated file from the stack trace. File never enters Claude's context. |
| Semantic summary of git changes (default: HEAD vs working tree). Diff never enters Claude's context. Requires git in PATH. |
| Search permanent notes + last checkpoint by task description. Call once at session start instead of manual local_note_search + file read. |
v2.1 — Smart buffer, execution filters, scratchpad, notes, cache
Smart Buffer (raw output triage)
Tool | What it does |
| Triage any raw output (test results, build logs, lint dumps) into Pattern + Anomalies + Signal. Always fits in budget. Use before injecting any raw tool output into context. |
| Read a window of lines from a file at an offset. On-demand raw access when |
| Meaning-level diff — noise (whitespace, formatting, minor rewording) suppressed. Only semantic changes surface. |
Execution Filters (project tools → local LLM)
Tool | What it does |
| Run the project test suite. Returns only |
| Run the linter. Violations grouped by rule; passing rules suppressed. |
| Run the build. Returns root cause + affected symbols only. |
Session Scratchpad (stateful decisions)
Tool | What it does |
| Write to a named scratchpad section: |
| Read the full scratchpad as a distilled summary. Restore context mid-session without re-reading files. |
| Freeze scratchpad into a |
Persistent Notes (cross-session knowledge)
Tool | What it does |
| Write a permanent note to disk ( |
| Full-text search across all persisted notes. Run at session start to surface relevant prior knowledge. |
Response Quality & Cache
Tool | What it does |
| Post-process an LLM draft through a refinement pass. Optional instructions target tone, brevity, or accuracy. |
| Show cache hit/miss counts, entry count, and total disk usage. |
| Evict all cached results. |
| Open the settings GUI — configure all 21 settings across Ollama, Timeouts, Limits, Cache, and Memo. Saves to |
Decision guide
Situation | Tool |
Don't know which tool to use |
|
File > 5 KB, one specific question |
|
File > 5 KB, need to reference it multiple times |
|
Text already in context, want to compress it |
|
"Find me the part about X" |
|
Need to outline a doc before extracting |
|
Want to know what's in a code file |
|
Want to understand a code file's structure |
|
Want the full file but bodies stripped |
|
"Find the function that does X" |
|
Multi-step process on the same document |
|
Unsure which tool to use |
|
Multiple questions about the same large doc |
|
Same question across 5+ files |
|
Understand what's in a directory |
|
"Find where X is handled" (concept search) |
|
Security or quality checklist |
|
Unsure of content type before processing |
|
Large log file |
|
Stack trace + source context |
|
JSON / CSV / API response payload |
|
Session too long, need to restart |
|
CLAUDE.md grown too large |
|
Need JSON as YAML (or any format swap) |
|
Need a schema for sample data |
|
Need a timeline from a log or changelog |
|
Compare two files without loading them |
|
Compare two in-context text blobs |
|
Prompt is vague — sharpen before sending to Claude |
|
Task is large — plan locally before Claude touches it |
|
Raw test/build/lint output about to enter context |
|
|
|
Two text blobs — want only the meaningful diff |
|
Run tests without dumping output into context |
|
Run lint without dumping output into context |
|
Run build without dumping output into context |
|
Want to record a decision or assumption mid-session |
|
Resuming work, need to restore session context |
|
About to |
|
Want to save a pattern or gotcha for future sessions |
|
Starting a session — check for relevant prior notes |
|
Starting a session — notes + checkpoint in one call |
|
Exception or stack trace to debug |
|
Semantic diff of git changes |
|
LLM draft needs a quality pass |
|
Check or clear the result cache |
|
Change any setting via GUI |
|
local_pipeline examples
# Extract auth sections, then summarize for security review
local_pipeline(text=big_doc, steps=[
{"op": "extract", "query": "authentication and authorization"},
{"op": "summarize", "focus": "security risks and gotchas"},
])
# Answer a question after narrowing to the relevant section
local_pipeline(text=api_docs, steps=[
{"op": "extract", "query": "rate limiting"},
{"op": "answer", "question": "what headers control retry behaviour?"},
])local_chat example
# Turn 1 — document is compressed automatically
r = local_chat(full_doc, "What does this library do?", "")
# r["doc"] = compressed version (hold this)
# r["history"] = conversation so far (hold this)
# r["answer"] = the answer
# Turn 2 — pass compressed doc + history back
r = local_chat(r["doc"], "How do I configure auth?", r["history"])
# Turn 3
r = local_chat(r["doc"], "Show me the relevant config keys", r["history"])Configuration
Using the Settings Editor
The fastest way to configure LocalThink is the built-in settings GUI. Type this in Claude Code:
local_configA desktop window opens immediately — no terminal, no JSON editing.
What you'll see:
Tab | Settings inside |
Ollama | Base URL · Default model · Fast model · Tiny model |
Timeouts | Main · Fast · Tiny · Health check · code_surface · git diff |
Limits | Max file bytes · Max pipeline steps · Max scan files · Classify sample · Batch concurrency · Chat history limit |
Cache | Cache directory · Cache TTL (days) |
Memo | Memo directory · Compact threshold · Max notes |
Status bar — the bottom of the window shows a live Ollama probe: a green dot with your model count means Ollama is reachable. Red dot means it's not running (ollama serve to fix).
Model dropdowns — the Ollama tab auto-populates model fields with every model currently pulled on your machine. You can also type a model name directly.
Directory fields — Cache directory and Memo directory have a Browse button that opens a folder picker.
Buttons:
Button | What it does |
Save | Writes |
Reset Tab | Restores all fields in the current tab to their built-in defaults (does not save) |
Cancel | Closes without saving any changes |
What applies instantly vs what needs a restart:
Instant (no restart needed): timeouts, limits, cache settings, memo settings
Requires restarting the MCP server: Ollama Base URL, Default model, Fast model, Tiny model
To restart after a model change: open the MCP panel in Claude Code (/mcp) and reconnect, or close and reopen Claude Code.
Settings are saved to ~/.localthink-mcp/config.json. You can also set any value manually as an env var (env vars take priority over the config file).
Ollama
Env var | Default | Recommended |
|
| Change only if Ollama runs on a remote machine or non-default port |
|
| Match your VRAM tier — see SETUP.md for the full table |
| (same as MODEL) | One tier smaller than the default (e.g. |
| (same as FAST) |
|
Timeouts
Env var | Default | Recommended |
|
|
|
|
|
|
|
| Rarely needs changing |
|
| Leave at |
|
| Increase to |
|
| Subprocess timeout (s) for local_git_diff |
Limits
Env var | Default | Recommended |
|
|
|
|
| Leave at |
|
| Increase to |
|
|
|
|
|
|
|
| Max chars of history kept per local_chat turn |
Cache
Env var | Default | Recommended |
|
| Change if the default drive is low on space |
|
|
|
Memo / Notes
Env var | Default | Recommended |
|
| Point to a synced folder (Dropbox, OneDrive) to share notes across machines |
|
|
|
|
| Max entries in permanent notes index |
Example: 3-tier model setup
# Pass models inline at registration — no file editing needed
claude mcp add localthink \
--env OLLAMA_MODEL="qwen2.5:14b-instruct-q4_K_M" \
--env OLLAMA_FAST_MODEL="qwen2.5:7b-instruct-q4_K_M" \
--env OLLAMA_TINY_MODEL="qwen2.5:3b" \
-- uvx localthink-mcpChange models any time with local_config (Ollama tab → Save → reconnect MCP).
Install options
uvx (recommended — zero setup)
claude mcp add localthink \
--env OLLAMA_MODEL="qwen2.5:14b-instruct-q4_K_M" \
--env OLLAMA_FAST_MODEL="qwen2.5:7b-instruct-q4_K_M" \
--env OLLAMA_TINY_MODEL="qwen2.5:3b" \
-- uvx localthink-mcppip
pip install localthink-mcp
claude mcp add localthink \
--env OLLAMA_MODEL="qwen2.5:14b-instruct-q4_K_M" \
--env OLLAMA_FAST_MODEL="qwen2.5:7b-instruct-q4_K_M" \
--env OLLAMA_TINY_MODEL="qwen2.5:3b" \
-- localthink-mcpWindows — if uvx isn't on Claude's PATH
claude mcp add --transport stdio localthink ^
--env OLLAMA_MODEL="qwen2.5:14b-instruct-q4_K_M" ^
--env OLLAMA_FAST_MODEL="qwen2.5:7b-instruct-q4_K_M" ^
--env OLLAMA_TINY_MODEL="qwen2.5:3b" ^
-- cmd /c uvx localthink-mcpSubstitute models for your hardware — see SETUP.md for the full tier table.
Security
Local only — runs as a stdio child process, never exposed to the network.
local_answer/local_shrink_file/local_auditread any path your shell can access. Same trust level as Claude's built-inReadtool.Ollama has no auth by default. Don't expose port
11434to the internet.No data leaves your machine. All inference is local.
Troubleshooting
[localthink] Ollama is not running
ollama serve
curl http://localhost:11434/api/tagsSlow responses Switch to a smaller model or set a fast model:
OLLAMA_MODEL=qwen2.5:7b-instruct claudeWindows: uvx not found
Install uv, then retry. Or use cmd /c uvx fallback.
License
MIT © 2026 H3xabah
This server cannot be installed
Resources
Unclaimed servers have limited discoverability.
Looking for Admin?
If you are the server author, to access and configure the admin panel.
Latest Blog Posts
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/H3xabah/Localthink-MCP'
If you have feedback or need assistance with the MCP directory API, please join our Discord server