mcp-brain
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@mcp-brainwork on ticket #42 — JWT login broken"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
mcp-brain
🚀 TL;DR
mcp-brain is a Model Context Protocol (MCP) server that gives Claude Code persistent, structured awareness of your project — without burning tokens on context rebuilding.
🧠 | Compressed awareness in ~100 tokens instead of ~2000 |
🎯 | 63.4% Hit@10 on SWE-bench Full (2294 real GitHub issues) — zero LLM cost |
⚡ | Sub-100ms file prediction (BM25 + code graph + optional semantic reranker) |
👥 | Team-aware: soft claims, conflict detection, ownership tracking |
🔄 | Self-healing: decision lifecycle, automatic staleness, feedback loop |
🛡️ | Local-first: SQLite, no cloud, no embeddings required, GDPR-friendly |
📑 Table of Contents
🚨 The Problem
Without persistent awareness, Claude Code operates blindly at the start of every session:
Without mcp-brain | With mcp-brain |
❌ No idea which files matter | ✅ Predicted files in top-K |
❌ Re-explores the repo every session | ✅ Compressed context in ~100 tokens |
❌ No visibility into teammates' WIP | ✅ Soft claims + conflict detection |
❌ Acts on outdated decisions | ✅ Decision lifecycle (active → stale) |
❌ Burns 2000–5000 tokens just to "orient" | ✅ One YAML block, ready to act |
Result without mcp-brain: wrong file exploration → outdated suggestions → merge conflicts → massive token waste.
⚡ What mcp-brain Changes
┌──────────────────────────────────────────────────────┐
│ │
│ Without: Claude → explores → guesses → retries │
│ → conflicts → high token usage │
│ │
│ With: Claude → predicts → verifies → acts │
│ → aligned → low token usage │
│ │
└──────────────────────────────────────────────────────┘🧬 Core idea
Instead of giving Claude more context, we give it structured awareness of reality.
We track:
📌 what changed (signal extraction from git)
🎯 what matters (scoring + lifecycle)
👥 who's working on what (team claims)
🧭 where to act (issue → file prediction)
…and we deliver it in ~100 tokens.
⏱️ In 60 seconds
You drop a one-line ticket into Claude Code:
> work on ticket #42 — JWT login brokenWithout mcp-brain, Claude starts grep-walking the repo, reading directory listings, opening README, sampling files — burning 2000+ tokens before producing the first useful sentence.
With mcp-brain, in <100ms Claude receives:
predictions:
- file: src/auth.py
confidence: high
why: "path + symbol match: login, jwt"
- file: src/middleware.py
confidence: medium
why: "imports auth (hop 1)"
- file: src/jwt_utils.py
confidence: medium
why: "called_by auth.login"
team_claims:
- { ticket: 39, author: dev-B, files: [middleware.py] } # ⚠️ overlap
avoid:
- "HS256 — vulnerable to key confusion. Migrated to RS256 in commit a1b2c3."
decisions:
- "tokens stored in httpOnly cookie, never localStorage"It's structured reality, not regenerated context. Claude can act on the first turn.
🔑 How It Works
flowchart TD
subgraph Capture[Capture signals]
A[Git commit] -->|filtered signals| B[mcp-brain memory]
C[Session end] -->|structured snapshot| B
end
subgraph Predict[Predict where to act]
E[Ticket opened] --> F[File predictor]
F -->|top-K files + confidence + why| D[Claude Code]
end
subgraph Coordinate[Coordinate team work]
F -->|overlap check| G[Team claims]
G -->|conflict warnings| D
end
subgraph Learn[Learn from outcomes]
H[Outcome recorded] -->|precision / recall| I[Feedback loop]
I -->|demote noisy memories| B
I -->|supersede stale decisions| B
end
B -->|~100-token YAML context| DCapture — git hooks promote only high-signal events (decisions, patterns, things to avoid). Ignored: docs, chore, tests, CI noise.
Compress — three-level memory (L1/L2/L3) auto-assigned by a scoring function (recency 35% + frequency 30% + impact 20% + explicit 15%).
Predict — issue title/body → ranked file list via BM25 + code graph expansion + optional semantic reranker.
Coordinate — soft claims warn before two devs touch the same files.
Self-correct — every closed ticket feeds precision/recall stats; noisy memories are auto-demoted.
🧠 Memory Hierarchy
Memories aren't dumped into one bag. They're scored and tiered, so the high-token slot in your prompt only carries what's signal-dense for this moment:
L1 — hot context loads automatically every session. Stack, conventions, current branch, recent commits, team claims, active high-confidence decisions. Capped at ~70 tokens.
L2 — warm context loads only on demand (
brain_get_decisions). Historical reasoning, superseded patterns, the why behind a past trade-off.L3 — cold archive is never sent to the model. Kept for audit, transparency, and the lifecycle's "undo" path.
The score is a transparent linear formula — no black-box embedding similarity. Every memory's level is reproducible and explainable.
🔍 Prediction Pipeline
The predictor is three deterministic stages:
Stage | What it does | Cost |
1. BM25 + IDF | Tokenize issue, match against symbols / identifiers / paths in an inverted index | ~5 ms |
2. Graph expansion | Walk | ~10 ms |
3. Semantic rerank (optional) | MiniLM (80 MB, CPU/GPU) embeds query + candidates, blends 30% cosine sim with 70% BM25 | ~50 ms |
Every prediction comes back with a why field and a full breakdown, so you can audit why a file was suggested — no opaque ranking.
💡 Default ON. To run lean (CI / containers without PyTorch), set
MCP_BRAIN_SEMANTIC=0and the pipeline degrades gracefully to BM25 + graph.
🔄 Decision Lifecycle
Memories aren't immortal. mcp-brain assumes you'll change your mind and bakes the lifecycle in:
Age-based decay — after
SUSPECT_DAYSa memory gets flagged for re-verification. AfterSTALE_DAYSit's hidden from prompts.Semantic supersession — write a new memory similar (cosine ≥ 0.85) to an old one and the old one is auto-marked
superseded.Feedback loop — when a memory is shown 3+ times before a reverted ticket, it gets demoted automatically. Noisy memories die fast.
This is what makes mcp-brain safe to leave running for months without manual cleanup. The L1 stays small and trustworthy; the L3 archives the audit trail.
🏗️ Architecture
flowchart TB
subgraph Client
CC[Claude Code]
end
subgraph Server[mcp-brain server]
T[MCP Tools layer<br/>brain_init, brain_get_context,<br/>brain_predict_files, ...]
R[Retriever<br/>+ Compressor]
P[File Predictor<br/>BM25 + Graph + Semantic]
F[Feedback Reconciler]
O[Observability<br/>p50/p95/p99]
end
subgraph Storage[Local storage ~/.mcp-brain/]
DB[(SQLite<br/>memories, sessions,<br/>projects, feedback)]
IDX[Inverted Index<br/>BM25]
G[Code Graph<br/>imports/calls]
Y[YAML claims]
end
CC <-->|MCP/stdio| T
T --> R
T --> P
T --> F
T --> O
R --> DB
P --> IDX
P --> G
F --> DB
O --> DBRepo layout
mcp-brain/
├── src/
│ ├── brain/ # core logic: retriever, compressor, scorer, predictor
│ │ # code_graph, file_indexer, semantic_reranker,
│ │ # staleness, similarity, feedback loop, observability
│ ├── capture/ # git hook signal extraction
│ ├── storage/ # SQLite layer
│ └── tools/ # MCP tool definitions (FastMCP)
├── benchmark/ # SWE-bench Lite/Full, Bench4BL, BugLocator harness
├── tests/ # pytest suite (predictor, feedback, observability, ...)
└── assets/ # SVG diagrams used in this README📊 Benchmark Results
We benchmark file localization — given a real GitHub issue, can mcp-brain rank the production files the accepted patch actually modified?
Dataset: SWE-bench Full
2294 real Python bug-fix tasks from major OSS projects (astropy, django, flask, matplotlib, pandas, pytest, requests, scikit-learn, sphinx, sympy, xarray)
Ground truth = files modified in the accepted reference patch (test files excluded by default — strict production-file evaluation)
Results — mcp-brain v1.4.0 (BM25 + graph + semantic)
Metric | @1 | @3 | @5 | @10 |
Hit | 24.5% | 43.4% | 53.7% | 63.4% |
Recall | 20.1% | 36.6% | 46.1% | 55.8% |
MAP | 24.5% | 28.4% | 30.4% | 31.8% |
Instances evaluated: 2294
Errors: 5 (0.2% failure rate)
Avg gold files per issue: 1.66
Avg predicted files: 9.98 (top-10)
Honest comparison vs. literature
System | Hit@10 (file loc.) | Cost per query | Notes |
BM25 baseline (vanilla) | ~45–55% | free | symbol search only |
mcp-brain v1.4.0 | 63.4% | free | BM25 + graph + semantic, zero LLM |
Agentless / SWE-agent | ~70–85% | $0.10–$2 | LLM-based, multi-step |
Reading the numbers:
Hit@5 = 53.7%→ in more than half of real issues, the right production file is in top-5 before Claude reads a single byte.Hit@10 = 63.4%→ expanded to top-10, almost 2 issues out of 3 have the right file ranked.MAP@1 = 24.5%→ the very first prediction is dead-on for 1 issue out of 4.0.2% error rateover 2294 runs → robust pipeline.
Reproduce it yourself
# One-time online setup
pip install -e .
pip install -r benchmark/requirements-benchmark.txt
python -m benchmark.adapters.swebench --dataset-name princeton-nlp/SWE-bench \
--output benchmark/datasets/cache/swebench_full.jsonl
python -m benchmark.prepare_repos \
--dataset benchmark/datasets/cache/swebench_full.jsonl \
--repo-cache benchmark/repos
# Offline evaluation (full)
python -m benchmark.run_eval \
--dataset benchmark/datasets/cache/swebench_full.jsonl \
--repo-cache benchmark/repos \
--out benchmark/results/swebench_full.json \
--report-dir benchmark/reports \
--top-k 10 --max-hops 2 --use-semanticReports are emitted as Markdown + HTML in benchmark/reports/.
The harness also supports SWE-bench Lite (300 instances), SWE-bench Verified, Bench4BL, and BugLocator — see benchmark/README.md.
💰 Token Efficiency
The math
A typical Claude Code session without mcp-brain spends thousands of tokens just to orient itself:
Phase (no mcp-brain) | Action | ~Tokens |
Session start | List directory, read README, sample files | 800–2000 |
Issue handling | Grep symbols, follow imports, retry wrong files | 1000–3000 |
Context restore | Re-explain project conventions | 200–500 |
Total per session | 2000–5500 |
A session with mcp-brain:
Phase (with mcp-brain) | Action | ~Tokens |
Session start |
| ~100 |
Issue handling |
| ~250 |
Decision recall |
| ~300 |
Total per session | ~650 |
Estimated saving
Without With mcp-brain Saving
Session start: 2000 ─────────► 100 tokens ~95%
Per session: 2000–5500 ──► 450–950 tokens 40–80%
Per developer*: ~1.2M/month ──► ~400k/month ~65%*assuming 100 sessions/month/dev
Why this works
✅ No embeddings required for retrieval (BM25 + code graph)
✅ No vector DB to query (zero round-trip cost)
✅ No history replay — context is reconstructed, not re-scrolled
✅ YAML compression with
default_flow_style=Trueand empty-key stripping✅ L1/L2 split — heavy memory only loaded on demand
💡 The semantic reranker (
use_semantic=True) is on by default and runs locally on CPU/GPU. It does not add LLM cost. Disable withMCP_BRAIN_SEMANTIC=0for lean CI.
🚀 Quick Start
Install — one command, batteries included
git clone https://github.com/PierfrancescoLijoi/mcp-brain.git
cd mcp-brain
pip install -e ".[all]"The [all] extra installs:
language parsers (Python, JS, TS, Go, Rust, Java, C#) for the code graph
semantic reranker (sentence-transformers + numpy)
dev tooling (pytest, pytest-cov)
Lean install paths
If you want a smaller footprint, you can pick exactly what you need:
pip install -e . # core only — BM25 + graph (no semantic, no parsers)
pip install -e ".[parsers]" # + multi-language parsers
pip install -e ".[semantic]" # + semantic reranker (~700 MB w/ PyTorch)
pip install -e ".[dev]" # + dev toolingRegister with Claude Code
claude mcp add mcp-brain python /absolute/path/to/run.pyOn Windows PowerShell:
claude mcp add mcp-brain python "C:\path\to\mcp-brain\run.py"Initialize your project
mcp-brain initThat's it. Open Claude Code in your repo and the L1 context is automatically available via brain_get_context.
🧠 MCP Tools
Tool | Purpose | When Claude calls it |
| Register project, stack, conventions | Once per repo |
| Load L1 context (~70 tokens) | Every session start |
| Load L2 decisions on demand | When historical context needed |
| Store a memory; level auto-assigned | When user makes a decision |
| Save end-of-session snapshot | At session end |
| Issue → ranked file list with | When opening a ticket |
| Start ticket workflow + conflict check | Workflow orchestration |
| Log ticket outcome (completed/reverted/...) | After ticket closed |
| Precision/recall window | Health checks |
| Surface noisy memories | Debugging |
| Full unified dashboard (YAML) | Ops / CI |
Example L1 context output (~100 tokens)
p: {name: my-api, stack: [FastAPI, PostgreSQL]}
s: {branch: feat/auth, wip: "JWT refactor", next: "add refresh token"}
git:
recent: ["refactor: JWT moved to RS256"]
changed: [auth.py, middleware.py]
team_claims:
- {ticket: 42, author: dev-B, files: [middleware.py]}
avoid:
- "avoid: HS256 — vulnerable to key confusion"
decisions:
- "decision: tokens stored httpOnly cookie, never localStorage"👉 Claude already knows where to act before reading a single source file.
💼 Use Cases
🎯 Solo developer
Cuts session-start exploration: −90% tokens on the first turn
Remembers your "I always do it this way" patterns
Auto-supersedes decisions when you change your mind
👥 Small team (3–10 devs)
Conflict detection before two devs touch the same files
Shared decision log with lifecycle (no more "wait, didn't we decide…?")
File ownership inference from git history
🏢 Enterprise (with caveats)
Local-first, no data leaves the machine → GDPR / SOC2-friendly
Compatible with Managed Identity / on-prem deployments (no cloud calls)
Token saving compounds: 65% × 100 devs × 100 sessions/month → measurable infra savings
❓ FAQ
No, and on purpose. mcp-brain is a structured awareness layer, not a retrieval-over-embeddings layer. The core retrieval is BM25 + code graph expansion — fully deterministic, sub-100ms, no vector DB to maintain. The semantic reranker is an optional 30% blend on top, used only as a tiebreaker. This is why token cost stays predictable and infra is local-first.
A long context window doesn't fix the problem — it makes it cheaper to waste. The bottleneck isn't capacity, it's signal density. Pasting your whole repo into the context still leaves Claude searching for the right file linearly. mcp-brain pre-ranks reality so the model spends its attention on the right 3 files, not the wrong 30.
No. Storage is SQLite under ~/.mcp-brain/ (local) and <repo>/.brain/shared/ (versioned with git if you choose). No outbound network calls, no telemetry, no cloud component. The semantic model runs on your CPU/GPU. This makes mcp-brain compatible with GDPR-restricted and air-gapped environments.
Write a new memory that contradicts it. Semantic supersession (cosine ≥ 0.85) will auto-mark the old one as superseded. You can also manually demote via brain_memory_health or wait for age-based decay (SUSPECT_DAYS / STALE_DAYS). The lifecycle assumes you'll change your mind.
Yes for indexing/predicting (BM25 is language-agnostic). The code graph currently supports Python, JavaScript, TypeScript, Go, Rust, Java, C# via tree-sitter parsers. Adding a new language is a single registry entry — see src/brain/parsers.py.
Different layer of the stack. SWE-agent and similar tools are autonomous coders — they read, plan, and patch via LLM calls. mcp-brain is the awareness layer underneath them. You could pair it with Aider or any MCP-compatible client; it makes whatever LLM you use start from a smarter zero.
Honest answer: file prediction is heuristic. Hit@1 = 24.5% means 3 issues out of 4 still need Claude to validate the prediction before acting. mcp-brain orients, it doesn't replace exploration. That's also why it's free — it's a force multiplier, not an oracle.
⚠️ Trade-offs
I'm honest about what this is and isn't.
Strength | Limitation |
✅ Zero LLM cost for retrieval | ⚠️ Heuristic-based: edge cases with no symbol/path overlap can miss |
✅ Sub-100ms predictions | ⚠️ Requires good commit hygiene (semantic commit messages help) |
✅ Local-first, no cloud | ⚠️ No cross-machine sync out of the box (use git for |
✅ Deterministic (replays produce same output) | ⚠️ Hit@1 = 24.5% → orients, doesn't replace exploration |
✅ Works on any size repo | ⚠️ Best on medium/large repos (small repos don't benefit much) |
This is NOT:
❌ a vector DB memory
❌ a RAG system
❌ an SWE-agent / autonomous coder
❌ a checkpoint / replay tool
This IS:
✅ a repo-aware, team-aware, token-efficient awareness layer
✅ a force multiplier for Claude Code, not a replacement
🛣️ Roadmap
BM25 + code graph + semantic reranker
Decision lifecycle with semantic supersession
Feedback loop with precision/recall reconciliation
Observability dashboard
SWE-bench Full benchmark (2294 instances)
Multi-language code graph (Python, JS, TS, Go, Rust, Java, C#)
Cross-repo memory federation (opt-in)
Real-time conflict push (currently pull-based)
VS Code extension companion
Hosted shared
.brain/for distributed teams (still local-first per dev)
🧪 Run the test suite
pip install -e ".[dev]"
pytest tests/ -vExpected: full pass on Python 3.10, 3.11, 3.12.
🤝 Contributing
PRs welcome. Before opening one:
pytest tests/ -vmust passNew behavior needs new tests
New MCP tools must be wrapped with
@observed("brain_<name>")Avoid heavy dependencies for the default install path — anything ML-flavored goes behind an optional extra
📄 License
MIT — see LICENSE.
This server cannot be installed
Maintenance
Resources
Unclaimed servers have limited discoverability.
Looking for Admin?
If you are the server author, to access and configure the admin panel.
Latest Blog Posts
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/PierfrancescoLijoi/mcp-brain'
If you have feedback or need assistance with the MCP directory API, please join our Discord server