CodeBrain
Provides integration with Ollama's local LLM inference server, allowing the MCP server to generate code, explain code, and polish text using locally-run models like Qwen2.5-Coder.
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@CodeBraingenerate 10 React button components with different styles"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
CodeBrain
An MCP server that lets Claude Code offload bulk work to a local LLM running on your own hardware.
What this is (and isn't)
Is: A Model Context Protocol (MCP) server that Claude Code registers as a sub-agent backend. When a session includes the kind of task a 14B local coder model handles well — generating 50 event templates, polishing 20 React components, drafting boilerplate — Claude Code calls into CodeBrain instead of spending its own output tokens. The local model does the bulk draft, Claude reviews and applies.
Is not: A Claude replacement. The reasoning, architecture decisions, debugging, and anything where "close enough" isn't good enough stays with Claude. CodeBrain is a Claude-offloader, not a Claude-competitor.
Why: Large-volume content and polish work burns through Claude's context and rate limits fast. A local model you can run unlimited costs nothing extra per call and keeps the high-value context free for the hard parts of the session.
Status
Phases 1–4 complete, Phase 5 deferred. Nine tools exposed, .brain/context.md passthrough live, per-file brain summaries scanner, verifier loop, consensus decoding. MCP integration verified in a real Claude Code session. Phase 5 (RAG) was explicitly scoped as "only if needed" and current use doesn't show cross-file search as a bottleneck, so it stays deferred.
How it works
Claude Code session CodeBrain MCP server Local machine
───────────────────── stdio ─────────────────── ─────────────
Claude delegates a ────────► codebrain_generate() ────► Ollama HTTP
bulk / polish task codebrain_explain() (localhost:11434)
codebrain_status() │
▼
Qwen2.5-Coder 14B
(GPU)
Claude reviews, ◄──────── tool result string ◄──── streamed response
applies, or pushes backNine tools are exposed today:
Tool | When Claude would reach for it |
| Bulk content, boilerplate, repetitive transformations, first drafts |
| N prompts with one shared system message, serial execution, index-stable errors so one failure doesn't abort the batch |
| Targeted transform over existing text — shorten, rephrase, translate, tighten. Auto-retries on no-op output. |
| Quick read-only explanations without burning Claude context |
| Generation with deterministic verifier loop: word-count / regex-schema checks, tightened-instruction retry on violation |
| N candidates + judge call → best single output. Use on high-variance tasks. |
| One-shot repo onboarding: detects stack, writes |
| Generate or refresh one |
| Walk + scan a tree; hash-gated, per-file failures don't abort the batch |
| Check which models are installed locally |
The use_brain flag on generation tools automatically prepends .brain/context.md from the current working directory to the system prompt, so project-specific context travels with every call without Claude having to pass it manually.
Requirements
Python 3.11+
Ollama — download for your OS. Tested with Ollama on Windows native, talking over
localhost:11434.A coder model pulled locally:
ollama pull qwen2.5-coder:14b~9 GB download. Fits in 12 GB VRAM at Q5. Other models work too (DeepSeek-Coder, Qwen3 if available) — set via
CODEBRAIN_MODELenv var.Claude Code CLI on the machine that will call the server (obviously).
Install
git clone <this repo> CodeBrain
cd CodeBrain
python -m venv .venv
.venv\Scripts\activate # on Windows
# source .venv/bin/activate # on macOS / Linux
pip install -e .Configure Claude Code
Add CodeBrain to your Claude Code MCP config. On Windows, that's usually ~/.claude.json (adjust path to where you cloned):
{
"mcpServers": {
"codebrain": {
"command": "C:\\Users\\YOU\\Desktop\\CodeBrain\\.venv\\Scripts\\python.exe",
"args": ["-m", "codebrain"]
}
}
}Restart any Claude Code session — the five codebrain_* tools should now appear in the available-tools list.
Keep brain files in sync automatically
Once you've run codebrain_init on a repo and scanned it with codebrain_scan_repo, you probably want brain files to refresh automatically whenever Claude edits source. Two pieces wire that up:
1. Project CLAUDE.md snippet — tell Claude to read brain files before opening source:
## Brain files
This repo has per-file `.brain` summaries next to each source file.
Before reading a full source file, read its `<path>.brain` sibling first.
Only open the source when the brain file is insufficient for the task.2. PostToolUse hook — regenerate the brain after every Edit/Write.
Add to .claude/settings.json in the repo root:
{
"hooks": {
"PostToolUse": [
{
"matcher": "Edit|Write",
"hooks": [
{
"type": "command",
"command": "python -c \"import asyncio, json, sys; from codebrain.brain_scanner import scan_file; d = json.load(sys.stdin); p = d.get('tool_input', {}).get('file_path'); p and p.endswith(('.py', '.ts', '.tsx', '.js', '.jsx', '.java', '.go', '.rs')) and print(asyncio.run(scan_file(p)))\""
}
]
}
]
}
}The hook inspects the edited path, skips non-source files via the extension filter, and kicks off a scan. Hash-gated: unchanged files don't hit Qwen.
Sanity check
Inside a Claude Code session, ask Claude:
Call
codebrain_statusand tell me what's installed.
If Ollama is running and the model is pulled, you'll get back qwen2.5-coder:14b in the list.
Configuration
Environment variables read by the backend:
Variable | Default | What it does |
|
| Point at a remote Ollama (e.g., an inference box on your LAN) |
|
| Switch to any model you've pulled |
|
| Seconds to wait for a single generation |
Project structure
CodeBrain/
├── codebrain/
│ ├── __init__.py
│ ├── __main__.py # `python -m codebrain` entry
│ ├── backend.py # Ollama HTTP client
│ ├── server.py # FastMCP server + tool definitions
│ ├── brain_scanner.py # scan_file / scan_repo + hash gate
│ ├── brain_init.py # one-shot .brain/context.md seeding
│ ├── verifier.py # deterministic output checks
│ └── prompts/
│ └── brain_few_shot.md # few-shot for brain-file generation
├── tests/ # 96 unit + integration tests
├── .spec/
│ ├── CURRENT.md # phase state
│ └── brain-file-format.md # brain-file format v1
├── pyproject.toml
├── LICENSE
└── README.mdRoadmap
Phase 1 — scaffold ✓
Ollama HTTP client with error handling
FastMCP server with stdio transport
Three core tools:
generate,explain,statusDocumented setup + Claude Code config
Verified in a real Claude Code session
Phase 2 — batch & context ✓
codebrain_batch_generatefor mass content with one shared system prompt, index-stable errorscodebrain_polishfor targeted transforms (shorten / rephrase / translate) instead of regeneration.brain/context.mdpassthrough — cwd project context auto-prepended to every generation callDogfood: coding tasks solid, text-transform tasks revealed real limits (informs Phase 3)
Phase 2.5 — brain system ✓
Per-file <source>.brain summaries sit next to each source file. Claude reads the brain first and only opens the source when the brain is insufficient.
codebrain_scan_file(path, force)— generate or refresh one brain filecodebrain_scan_repo(root, force, extensions, exclude_dirs)— bulk walk + scancodebrain_init(root, force)— seed.brain/context.mdwith stack detectionHash-gated regeneration (SHA256) — idempotent reruns
Programmatic frontmatter — deterministic
source,source_hash,model; Qwen only writes the five sectionsDefense-in-depth validation: fence-strip, skip-empty-sources (<10 chars), section-presence/order, retry-on-invalid
CLAUDE.md convention + PostToolUse hook snippet in this README
Phase 3 — VERIFIER loop ✓
Dogfood showed the local model drifts on text transforms. The verifier catches no-ops, length violations, and schema misses deterministically before they reach Claude.
detect_noop— whitespace-normalised equality check (auto-retries insidecodebrain_polish)check_word_count(min_words, max_words)— bounded-window gatecheck_regex_schema(pattern)— structured-output checkcodebrain_generate_verified(prompt, min_words, max_words, must_match, max_retries)— loop with tightened retry instructions, returns[codebrain warning] ...if verification fails after retries
Phase 4 — consensus decoding ✓
codebrain_consensus_generate(prompt, n)— generate N candidates (clamped to [2,5]), Qwen picks the best verbatim. N+1 inference calls, tightens quality on high-variance tasks.Multi-pass skeleton→logic→edges→polish: deferred (low measured value; individual tools already compose).
Phase 5 — RAG (deferred — not a bottleneck)
Brain files already act as an index; cross-file RAG only makes sense if future use actually shows that indexing is the blocker. No current signal for it, so not built.
License
MIT — see LICENSE.
Resources
Unclaimed servers have limited discoverability.
Looking for Admin?
If you are the server author, to access and configure the admin panel.
Latest Blog Posts
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/Tschonsen/CodeBrain'
If you have feedback or need assistance with the MCP directory API, please join our Discord server