CodeBrain
Provides integration with Ollama's local LLM inference server, allowing the MCP server to generate code, explain code, and polish text using locally-run models like Qwen2.5-Coder.
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@CodeBraingenerate 10 React button components with different styles"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
CodeBrain
An MCP server that lets Claude Code offload bulk work to a local LLM running on your own hardware.
What this is (and isn't)
Is: A Model Context Protocol (MCP) server that Claude Code registers as a sub-agent backend. When a session includes the kind of task a 14B local coder model handles well — generating 50 event templates, polishing 20 React components, drafting boilerplate — Claude Code calls into CodeBrain instead of spending its own output tokens. The local model does the bulk draft, Claude reviews and applies.
Is not: A Claude replacement. The reasoning, architecture decisions, debugging, and anything where "close enough" isn't good enough stays with Claude. CodeBrain is a Claude-offloader, not a Claude-competitor.
Why: Large-volume content and polish work burns through Claude's context and rate limits fast. A local model you can run unlimited costs nothing extra per call and keeps the high-value context free for the hard parts of the session.
Related MCP server: codex-dobby-mcp
Status
Phases 1–4 complete, Phase 5 deferred. Nine tools exposed, .brain/context.md passthrough live, per-file brain summaries scanner, verifier loop, consensus decoding. MCP integration verified in a real Claude Code session. Phase 5 (RAG) was explicitly scoped as "only if needed" and current use doesn't show cross-file search as a bottleneck, so it stays deferred.
How it works
Claude Code session CodeBrain MCP server Local machine
───────────────────── stdio ─────────────────── ─────────────
Claude delegates a ────────► codebrain_generate() ────► Ollama HTTP
bulk / polish task codebrain_explain() (localhost:11434)
codebrain_status() │
▼
Qwen2.5-Coder 14B
(GPU)
Claude reviews, ◄──────── tool result string ◄──── streamed response
applies, or pushes backNine tools are exposed today:
Tool | When Claude would reach for it |
| Bulk content, boilerplate, repetitive transformations, first drafts |
| N prompts with one shared system message, serial execution, index-stable errors so one failure doesn't abort the batch |
| Targeted transform over existing text — shorten, rephrase, translate, tighten. Auto-retries on no-op output. |
| Quick read-only explanations without burning Claude context |
| Generation with deterministic verifier loop: word-count / regex-schema checks, tightened-instruction retry on violation |
| N candidates + judge call → best single output. Use on high-variance tasks. |
| One-shot repo onboarding: detects stack, writes |
| Generate or refresh one |
| Walk + scan a tree; hash-gated, per-file failures don't abort the batch |
| Check which models are installed locally |
The use_brain flag on generation tools automatically prepends .brain/context.md from the current working directory to the system prompt, so project-specific context travels with every call without Claude having to pass it manually.
Requirements
Python 3.11+
Ollama — download for your OS. Tested with Ollama on Windows native, talking over
localhost:11434.A coder model pulled locally:
ollama pull qwen2.5-coder:14b~9 GB download. Fits in 12 GB VRAM at Q5. Other models work too (DeepSeek-Coder, Qwen3 if available) — set via
CODEBRAIN_MODELenv var.Claude Code CLI on the machine that will call the server (obviously).
Install
git clone <this repo> CodeBrain
cd CodeBrain
python -m venv .venv
.venv\Scripts\activate # on Windows
# source .venv/bin/activate # on macOS / Linux
pip install -e .Configure Claude Code
Add CodeBrain to your Claude Code MCP config. On Windows, that's usually ~/.claude.json (adjust path to where you cloned):
{
"mcpServers": {
"codebrain": {
"command": "C:\\Users\\YOU\\Desktop\\CodeBrain\\.venv\\Scripts\\python.exe",
"args": ["-m", "codebrain"]
}
}
}Restart any Claude Code session — the five codebrain_* tools should now appear in the available-tools list.
Keep brain files in sync automatically
Once you've run codebrain_init on a repo and scanned it with codebrain_scan_repo, you probably want brain files to refresh automatically whenever Claude edits source. Two pieces wire that up:
1. Project CLAUDE.md snippet — tell Claude to read brain files before opening source:
## Brain files
This repo has per-file `.brain` summaries next to each source file.
Before reading a full source file, read its `<path>.brain` sibling first.
Only open the source when the brain file is insufficient for the task.2. PostToolUse hook — regenerate the brain after every Edit/Write.
Add to .claude/settings.json in the repo root:
{
"hooks": {
"PostToolUse": [
{
"matcher": "Edit|Write",
"hooks": [
{
"type": "command",
"command": "python -c \"import asyncio, json, sys; from codebrain.brain_scanner import scan_file; d = json.load(sys.stdin); p = d.get('tool_input', {}).get('file_path'); p and p.endswith(('.py', '.ts', '.tsx', '.js', '.jsx', '.java', '.go', '.rs')) and print(asyncio.run(scan_file(p)))\""
}
]
}
]
}
}The hook inspects the edited path, skips non-source files via the extension filter, and kicks off a scan. Hash-gated: unchanged files don't hit Qwen.
Sanity check
Inside a Claude Code session, ask Claude:
Call
codebrain_statusand tell me what's installed.
If Ollama is running and the model is pulled, you'll get back qwen2.5-coder:14b in the list.
Configuration
Environment variables read by the backend:
Variable | Default | What it does |
|
| Point at a remote Ollama (e.g., an inference box on your LAN) |
|
| Switch to any model you've pulled |
|
| Seconds to wait for a single generation |
Project structure
CodeBrain/
├── codebrain/
│ ├── __init__.py
│ ├── __main__.py # `python -m codebrain` entry
│ ├── backend.py # Ollama HTTP client
│ ├── server.py # FastMCP server + tool definitions
│ ├── brain_scanner.py # scan_file / scan_repo + hash gate
│ ├── brain_init.py # one-shot .brain/context.md seeding
│ ├── verifier.py # deterministic output checks
│ └── prompts/
│ └── brain_few_shot.md # few-shot for brain-file generation
├── tests/ # 96 unit + integration tests
├── .spec/
│ ├── CURRENT.md # phase state
│ └── brain-file-format.md # brain-file format v1
├── pyproject.toml
├── LICENSE
└── README.mdRoadmap
Phase 1 — scaffold ✓
Ollama HTTP client with error handling
FastMCP server with stdio transport
Three core tools:
generate,explain,statusDocumented setup + Claude Code config
Verified in a real Claude Code session
Phase 2 — batch & context ✓
codebrain_batch_generatefor mass content with one shared system prompt, index-stable errorscodebrain_polishfor targeted transforms (shorten / rephrase / translate) instead of regeneration.brain/context.mdpassthrough — cwd project context auto-prepended to every generation callDogfood: coding tasks solid, text-transform tasks revealed real limits (informs Phase 3)
Phase 2.5 — brain system ✓
Per-file <source>.brain summaries sit next to each source file. Claude reads the brain first and only opens the source when the brain is insufficient.
codebrain_scan_file(path, force)— generate or refresh one brain filecodebrain_scan_repo(root, force, extensions, exclude_dirs)— bulk walk + scancodebrain_init(root, force)— seed.brain/context.mdwith stack detectionHash-gated regeneration (SHA256) — idempotent reruns
Programmatic frontmatter — deterministic
source,source_hash,model; Qwen only writes the five sectionsDefense-in-depth validation: fence-strip, skip-empty-sources (<10 chars), section-presence/order, retry-on-invalid
CLAUDE.md convention + PostToolUse hook snippet in this README
Phase 3 — VERIFIER loop ✓
Dogfood showed the local model drifts on text transforms. The verifier catches no-ops, length violations, and schema misses deterministically before they reach Claude.
detect_noop— whitespace-normalised equality check (auto-retries insidecodebrain_polish)check_word_count(min_words, max_words)— bounded-window gatecheck_regex_schema(pattern)— structured-output checkcodebrain_generate_verified(prompt, min_words, max_words, must_match, max_retries)— loop with tightened retry instructions, returns[codebrain warning] ...if verification fails after retries
Phase 4 — consensus decoding ✓
codebrain_consensus_generate(prompt, n)— generate N candidates (clamped to [2,5]), Qwen picks the best verbatim. N+1 inference calls, tightens quality on high-variance tasks.Multi-pass skeleton→logic→edges→polish: deferred (low measured value; individual tools already compose).
Phase 5 — RAG (deferred — not a bottleneck)
Brain files already act as an index; cross-file RAG only makes sense if future use actually shows that indexing is the blocker. No current signal for it, so not built.
License
MIT — see LICENSE.
Maintenance
Resources
Unclaimed servers have limited discoverability.
Looking for Admin?
If you are the server author, to access and configure the admin panel.
Latest Blog Posts
- Your AI Chatbot Just Exposed Your CEO's Salary to an InternBy Om-Shree-0709 on .Agent IdentityMCP SecurityOAuth Delegation
- Why MCP Servers Need Execution Sandboxing (And Why Your Current Stack Isn't Enough)By Om-Shree-0709 on .Agentic AiPrompt InjectionWebAssembly
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/Tschonsen/CodeBrain'
If you have feedback or need assistance with the MCP directory API, please join our Discord server