Skip to main content
Glama

mcp-brain


🚀 TL;DR

mcp-brain is a Model Context Protocol (MCP) server that gives Claude Code persistent, structured awareness of your project — without burning tokens on context rebuilding.

🧠

Compressed awareness in ~100 tokens instead of ~2000

🎯

63.4% Hit@10 on SWE-bench Full (2294 real GitHub issues) — zero LLM cost

Sub-100ms file prediction (BM25 + code graph + optional semantic reranker)

👥

Team-aware: soft claims, conflict detection, ownership tracking

🔄

Self-healing: decision lifecycle, automatic staleness, feedback loop

🛡️

Local-first: SQLite, no cloud, no embeddings required, GDPR-friendly


📑 Table of Contents


🚨 The Problem

Without persistent awareness, Claude Code operates blindly at the start of every session:

Without mcp-brain

With mcp-brain

❌ No idea which files matter

✅ Predicted files in top-K

❌ Re-explores the repo every session

✅ Compressed context in ~100 tokens

❌ No visibility into teammates' WIP

✅ Soft claims + conflict detection

❌ Acts on outdated decisions

✅ Decision lifecycle (active → stale)

❌ Burns 2000–5000 tokens just to "orient"

✅ One YAML block, ready to act

Result without mcp-brain: wrong file exploration → outdated suggestions → merge conflicts → massive token waste.


⚡ What mcp-brain Changes

┌──────────────────────────────────────────────────────┐
│                                                      │
│   Without:  Claude → explores → guesses → retries    │
│             → conflicts → high token usage           │
│                                                      │
│   With:     Claude → predicts → verifies → acts      │
│             → aligned → low token usage              │
│                                                      │
└──────────────────────────────────────────────────────┘

🧬 Core idea

Instead of giving Claude more context, we give it structured awareness of reality.

We track:

  • 📌 what changed (signal extraction from git)

  • 🎯 what matters (scoring + lifecycle)

  • 👥 who's working on what (team claims)

  • 🧭 where to act (issue → file prediction)

…and we deliver it in ~100 tokens.


⏱️ In 60 seconds

You drop a one-line ticket into Claude Code:

> work on ticket #42 — JWT login broken

Without mcp-brain, Claude starts grep-walking the repo, reading directory listings, opening README, sampling files — burning 2000+ tokens before producing the first useful sentence.

With mcp-brain, in <100ms Claude receives:

predictions:
  - file: src/auth.py
    confidence: high
    why: "path + symbol match: login, jwt"
  - file: src/middleware.py
    confidence: medium
    why: "imports auth (hop 1)"
  - file: src/jwt_utils.py
    confidence: medium
    why: "called_by auth.login"
team_claims:
  - { ticket: 39, author: dev-B, files: [middleware.py] }   # ⚠️ overlap
avoid:
  - "HS256 — vulnerable to key confusion. Migrated to RS256 in commit a1b2c3."
decisions:
  - "tokens stored in httpOnly cookie, never localStorage"

It's structured reality, not regenerated context. Claude can act on the first turn.


🔑 How It Works

flowchart TD
    subgraph Capture[Capture signals]
        A[Git commit] -->|filtered signals| B[mcp-brain memory]
        C[Session end] -->|structured snapshot| B
    end

    subgraph Predict[Predict where to act]
        E[Ticket opened] --> F[File predictor]
        F -->|top-K files + confidence + why| D[Claude Code]
    end

    subgraph Coordinate[Coordinate team work]
        F -->|overlap check| G[Team claims]
        G -->|conflict warnings| D
    end

    subgraph Learn[Learn from outcomes]
        H[Outcome recorded] -->|precision / recall| I[Feedback loop]
        I -->|demote noisy memories| B
        I -->|supersede stale decisions| B
    end

    B -->|~100-token YAML context| D
  1. Capture — git hooks promote only high-signal events (decisions, patterns, things to avoid). Ignored: docs, chore, tests, CI noise.

  2. Compress — three-level memory (L1/L2/L3) auto-assigned by a scoring function (recency 35% + frequency 30% + impact 20% + explicit 15%).

  3. Predict — issue title/body → ranked file list via BM25 + code graph expansion + optional semantic reranker.

  4. Coordinate — soft claims warn before two devs touch the same files.

  5. Self-correct — every closed ticket feeds precision/recall stats; noisy memories are auto-demoted.


🧠 Memory Hierarchy

Memories aren't dumped into one bag. They're scored and tiered, so the high-token slot in your prompt only carries what's signal-dense for this moment:

  • L1 — hot context loads automatically every session. Stack, conventions, current branch, recent commits, team claims, active high-confidence decisions. Capped at ~70 tokens.

  • L2 — warm context loads only on demand (brain_get_decisions). Historical reasoning, superseded patterns, the why behind a past trade-off.

  • L3 — cold archive is never sent to the model. Kept for audit, transparency, and the lifecycle's "undo" path.

The score is a transparent linear formula — no black-box embedding similarity. Every memory's level is reproducible and explainable.


🔍 Prediction Pipeline

The predictor is three deterministic stages:

Stage

What it does

Cost

1. BM25 + IDF

Tokenize issue, match against symbols / identifiers / paths in an inverted index

~5 ms

2. Graph expansion

Walk imports / imported_by / called_by from seeds. Score decays per hop (×0.5, ×0.25)

~10 ms

3. Semantic rerank (optional)

MiniLM (80 MB, CPU/GPU) embeds query + candidates, blends 30% cosine sim with 70% BM25

~50 ms

Every prediction comes back with a why field and a full breakdown, so you can audit why a file was suggested — no opaque ranking.

💡 Default ON. To run lean (CI / containers without PyTorch), set MCP_BRAIN_SEMANTIC=0 and the pipeline degrades gracefully to BM25 + graph.


🔄 Decision Lifecycle

Memories aren't immortal. mcp-brain assumes you'll change your mind and bakes the lifecycle in:

  • Age-based decay — after SUSPECT_DAYS a memory gets flagged for re-verification. After STALE_DAYS it's hidden from prompts.

  • Semantic supersession — write a new memory similar (cosine ≥ 0.85) to an old one and the old one is auto-marked superseded.

  • Feedback loop — when a memory is shown 3+ times before a reverted ticket, it gets demoted automatically. Noisy memories die fast.

This is what makes mcp-brain safe to leave running for months without manual cleanup. The L1 stays small and trustworthy; the L3 archives the audit trail.


🏗️ Architecture

flowchart TB
    subgraph Client
        CC[Claude Code]
    end
    subgraph Server[mcp-brain server]
        T[MCP Tools layer<br/>brain_init, brain_get_context,<br/>brain_predict_files, ...]
        R[Retriever<br/>+ Compressor]
        P[File Predictor<br/>BM25 + Graph + Semantic]
        F[Feedback Reconciler]
        O[Observability<br/>p50/p95/p99]
    end
    subgraph Storage[Local storage ~/.mcp-brain/]
        DB[(SQLite<br/>memories, sessions,<br/>projects, feedback)]
        IDX[Inverted Index<br/>BM25]
        G[Code Graph<br/>imports/calls]
        Y[YAML claims]
    end
    CC <-->|MCP/stdio| T
    T --> R
    T --> P
    T --> F
    T --> O
    R --> DB
    P --> IDX
    P --> G
    F --> DB
    O --> DB

Repo layout

mcp-brain/
├── src/
│   ├── brain/         # core logic: retriever, compressor, scorer, predictor
│   │                  # code_graph, file_indexer, semantic_reranker,
│   │                  # staleness, similarity, feedback loop, observability
│   ├── capture/       # git hook signal extraction
│   ├── storage/       # SQLite layer
│   └── tools/         # MCP tool definitions (FastMCP)
├── benchmark/         # SWE-bench Lite/Full, Bench4BL, BugLocator harness
├── tests/             # pytest suite (predictor, feedback, observability, ...)
└── assets/            # SVG diagrams used in this README

📊 Benchmark Results

We benchmark file localizationgiven a real GitHub issue, can mcp-brain rank the production files the accepted patch actually modified?

Dataset: SWE-bench Full

  • 2294 real Python bug-fix tasks from major OSS projects (astropy, django, flask, matplotlib, pandas, pytest, requests, scikit-learn, sphinx, sympy, xarray)

  • Ground truth = files modified in the accepted reference patch (test files excluded by default — strict production-file evaluation)

Results — mcp-brain v1.4.0 (BM25 + graph + semantic)

Metric

@1

@3

@5

@10

Hit

24.5%

43.4%

53.7%

63.4%

Recall

20.1%

36.6%

46.1%

55.8%

MAP

24.5%

28.4%

30.4%

31.8%

  • Instances evaluated: 2294

  • Errors: 5 (0.2% failure rate)

  • Avg gold files per issue: 1.66

  • Avg predicted files: 9.98 (top-10)

Honest comparison vs. literature

System

Hit@10 (file loc.)

Cost per query

Notes

BM25 baseline (vanilla)

~45–55%

free

symbol search only

mcp-brain v1.4.0

63.4%

free

BM25 + graph + semantic, zero LLM

Agentless / SWE-agent

~70–85%

$0.10–$2

LLM-based, multi-step

Reading the numbers:

  • Hit@5 = 53.7% → in more than half of real issues, the right production file is in top-5 before Claude reads a single byte.

  • Hit@10 = 63.4% → expanded to top-10, almost 2 issues out of 3 have the right file ranked.

  • MAP@1 = 24.5% → the very first prediction is dead-on for 1 issue out of 4.

  • 0.2% error rate over 2294 runs → robust pipeline.

Reproduce it yourself

# One-time online setup
pip install -e .
pip install -r benchmark/requirements-benchmark.txt
python -m benchmark.adapters.swebench --dataset-name princeton-nlp/SWE-bench \
  --output benchmark/datasets/cache/swebench_full.jsonl
python -m benchmark.prepare_repos \
  --dataset benchmark/datasets/cache/swebench_full.jsonl \
  --repo-cache benchmark/repos

# Offline evaluation (full)
python -m benchmark.run_eval \
  --dataset benchmark/datasets/cache/swebench_full.jsonl \
  --repo-cache benchmark/repos \
  --out benchmark/results/swebench_full.json \
  --report-dir benchmark/reports \
  --top-k 10 --max-hops 2 --use-semantic

Reports are emitted as Markdown + HTML in benchmark/reports/.

The harness also supports SWE-bench Lite (300 instances), SWE-bench Verified, Bench4BL, and BugLocator — see benchmark/README.md.


💰 Token Efficiency

The math

A typical Claude Code session without mcp-brain spends thousands of tokens just to orient itself:

Phase (no mcp-brain)

Action

~Tokens

Session start

List directory, read README, sample files

800–2000

Issue handling

Grep symbols, follow imports, retry wrong files

1000–3000

Context restore

Re-explain project conventions

200–500

Total per session

2000–5500

A session with mcp-brain:

Phase (with mcp-brain)

Action

~Tokens

Session start

brain_get_context returns compressed L1 YAML

~100

Issue handling

brain_predict_files returns ranked top-K + why

~250

Decision recall

brain_get_decisions (only when needed)

~300

Total per session

~650

Estimated saving

                        Without          With mcp-brain     Saving
  Session start:    2000 ─────────►       100 tokens        ~95%
  Per session:      2000–5500 ──►       450–950 tokens      40–80%
  Per developer*:   ~1.2M/month ──►    ~400k/month          ~65%

*assuming 100 sessions/month/dev

Why this works

  • No embeddings required for retrieval (BM25 + code graph)

  • No vector DB to query (zero round-trip cost)

  • No history replay — context is reconstructed, not re-scrolled

  • YAML compression with default_flow_style=True and empty-key stripping

  • L1/L2 split — heavy memory only loaded on demand

💡 The semantic reranker (use_semantic=True) is on by default and runs locally on CPU/GPU. It does not add LLM cost. Disable with MCP_BRAIN_SEMANTIC=0 for lean CI.


🚀 Quick Start

Install — one command, batteries included

git clone https://github.com/PierfrancescoLijoi/mcp-brain.git
cd mcp-brain
pip install -e ".[all]"

The [all] extra installs:

  • language parsers (Python, JS, TS, Go, Rust, Java, C#) for the code graph

  • semantic reranker (sentence-transformers + numpy)

  • dev tooling (pytest, pytest-cov)

Lean install paths

If you want a smaller footprint, you can pick exactly what you need:

pip install -e .                      # core only — BM25 + graph (no semantic, no parsers)
pip install -e ".[parsers]"           # + multi-language parsers
pip install -e ".[semantic]"          # + semantic reranker (~700 MB w/ PyTorch)
pip install -e ".[dev]"               # + dev tooling

Register with Claude Code

claude mcp add mcp-brain python /absolute/path/to/run.py

On Windows PowerShell:

claude mcp add mcp-brain python "C:\path\to\mcp-brain\run.py"

Initialize your project

mcp-brain init

That's it. Open Claude Code in your repo and the L1 context is automatically available via brain_get_context.


🧠 MCP Tools

Tool

Purpose

When Claude calls it

brain_init

Register project, stack, conventions

Once per repo

brain_get_context

Load L1 context (~70 tokens)

Every session start

brain_get_decisions

Load L2 decisions on demand

When historical context needed

brain_remember

Store a memory; level auto-assigned

When user makes a decision

brain_save_session

Save end-of-session snapshot

At session end

brain_predict_files

Issue → ranked file list with why

When opening a ticket

brain_start_ticket

Start ticket workflow + conflict check

Workflow orchestration

brain_record_outcome

Log ticket outcome (completed/reverted/...)

After ticket closed

brain_feedback_stats

Precision/recall window

Health checks

brain_memory_health

Surface noisy memories

Debugging

brain_observability

Full unified dashboard (YAML)

Ops / CI

Example L1 context output (~100 tokens)

p: {name: my-api, stack: [FastAPI, PostgreSQL]}
s: {branch: feat/auth, wip: "JWT refactor", next: "add refresh token"}

git:
  recent: ["refactor: JWT moved to RS256"]
  changed: [auth.py, middleware.py]

team_claims:
  - {ticket: 42, author: dev-B, files: [middleware.py]}

avoid:
  - "avoid: HS256 — vulnerable to key confusion"

decisions:
  - "decision: tokens stored httpOnly cookie, never localStorage"

👉 Claude already knows where to act before reading a single source file.


💼 Use Cases

🎯 Solo developer

  • Cuts session-start exploration: −90% tokens on the first turn

  • Remembers your "I always do it this way" patterns

  • Auto-supersedes decisions when you change your mind

👥 Small team (3–10 devs)

  • Conflict detection before two devs touch the same files

  • Shared decision log with lifecycle (no more "wait, didn't we decide…?")

  • File ownership inference from git history

🏢 Enterprise (with caveats)

  • Local-first, no data leaves the machine → GDPR / SOC2-friendly

  • Compatible with Managed Identity / on-prem deployments (no cloud calls)

  • Token saving compounds: 65% × 100 devs × 100 sessions/month → measurable infra savings


❓ FAQ

No, and on purpose. mcp-brain is a structured awareness layer, not a retrieval-over-embeddings layer. The core retrieval is BM25 + code graph expansion — fully deterministic, sub-100ms, no vector DB to maintain. The semantic reranker is an optional 30% blend on top, used only as a tiebreaker. This is why token cost stays predictable and infra is local-first.

A long context window doesn't fix the problem — it makes it cheaper to waste. The bottleneck isn't capacity, it's signal density. Pasting your whole repo into the context still leaves Claude searching for the right file linearly. mcp-brain pre-ranks reality so the model spends its attention on the right 3 files, not the wrong 30.

No. Storage is SQLite under ~/.mcp-brain/ (local) and <repo>/.brain/shared/ (versioned with git if you choose). No outbound network calls, no telemetry, no cloud component. The semantic model runs on your CPU/GPU. This makes mcp-brain compatible with GDPR-restricted and air-gapped environments.

Write a new memory that contradicts it. Semantic supersession (cosine ≥ 0.85) will auto-mark the old one as superseded. You can also manually demote via brain_memory_health or wait for age-based decay (SUSPECT_DAYS / STALE_DAYS). The lifecycle assumes you'll change your mind.

Yes for indexing/predicting (BM25 is language-agnostic). The code graph currently supports Python, JavaScript, TypeScript, Go, Rust, Java, C# via tree-sitter parsers. Adding a new language is a single registry entry — see src/brain/parsers.py.

Different layer of the stack. SWE-agent and similar tools are autonomous coders — they read, plan, and patch via LLM calls. mcp-brain is the awareness layer underneath them. You could pair it with Aider or any MCP-compatible client; it makes whatever LLM you use start from a smarter zero.

Honest answer: file prediction is heuristic. Hit@1 = 24.5% means 3 issues out of 4 still need Claude to validate the prediction before acting. mcp-brain orients, it doesn't replace exploration. That's also why it's free — it's a force multiplier, not an oracle.


⚠️ Trade-offs

I'm honest about what this is and isn't.

Strength

Limitation

✅ Zero LLM cost for retrieval

⚠️ Heuristic-based: edge cases with no symbol/path overlap can miss

✅ Sub-100ms predictions

⚠️ Requires good commit hygiene (semantic commit messages help)

✅ Local-first, no cloud

⚠️ No cross-machine sync out of the box (use git for .brain/shared/)

✅ Deterministic (replays produce same output)

⚠️ Hit@1 = 24.5% → orients, doesn't replace exploration

✅ Works on any size repo

⚠️ Best on medium/large repos (small repos don't benefit much)

This is NOT:

  • ❌ a vector DB memory

  • ❌ a RAG system

  • ❌ an SWE-agent / autonomous coder

  • ❌ a checkpoint / replay tool

This IS:

  • ✅ a repo-aware, team-aware, token-efficient awareness layer

  • ✅ a force multiplier for Claude Code, not a replacement


🛣️ Roadmap

  • BM25 + code graph + semantic reranker

  • Decision lifecycle with semantic supersession

  • Feedback loop with precision/recall reconciliation

  • Observability dashboard

  • SWE-bench Full benchmark (2294 instances)

  • Multi-language code graph (Python, JS, TS, Go, Rust, Java, C#)

  • Cross-repo memory federation (opt-in)

  • Real-time conflict push (currently pull-based)

  • VS Code extension companion

  • Hosted shared .brain/ for distributed teams (still local-first per dev)


🧪 Run the test suite

pip install -e ".[dev]"
pytest tests/ -v

Expected: full pass on Python 3.10, 3.11, 3.12.


🤝 Contributing

PRs welcome. Before opening one:

  1. pytest tests/ -v must pass

  2. New behavior needs new tests

  3. New MCP tools must be wrapped with @observed("brain_<name>")

  4. Avoid heavy dependencies for the default install path — anything ML-flavored goes behind an optional extra


📄 License

MIT — see LICENSE.


A
license - permissive license
-
quality - not tested
C
maintenance

Maintenance

Maintainers
Response time
1wRelease cycle
2Releases (12mo)

Resources

Unclaimed servers have limited discoverability.

Looking for Admin?

If you are the server author, to access and configure the admin panel.

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/PierfrancescoLijoi/mcp-brain'

If you have feedback or need assistance with the MCP directory API, please join our Discord server