knitbrain
Optimizes OpenAI API requests through a proxy that compresses token usage, especially for past turns and bulk content, while keeping instructions verbatim and enabling retrieval of originals.
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@knitbrainoptimize my current context window"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
knitbrain is a local-first MCP server (34 tools) that any coding agent can connect to — Claude Code, Cursor, Codex, Copilot, Windsurf, Cline, and others. It does three things, plus the supporting pieces that make them work:
Token optimization — losslessly compress large tool output so your context window lasts longer.
A wiki — a compounding markdown knowledge base the agent maintains, instead of re-deriving context every session.
A closed loop — drive a goal to a verified result through repeated judge → iterate → grade → review cycles.
Pure Node, three runtime dependencies, no Python, no ML runtime. Everything runs locally under
~/.knitbrain; the proxy, hub, and dashboard bind 127.0.0.1. Every number below comes from a command
you can run on your own data.
Install
npm install -g knitbrain # or: npx knitbrain <command>
knitbrain profile # measure compression on YOUR transcripts, before changing anything
knitbrain setup # wire it into your agent(s): MCP config, rules, AGENTS.mdRequires Node ≥ 18. setup writes native config per platform and, on Claude Code, the lifecycle hooks.
Related MCP server: local-memory-mcp
1. Token optimization
Tool results (code, logs, diffs, JSON, prose) are routed to a structure-preserving skeletonizer
(tree-sitter AST plus deterministic handlers). The exact original is kept in a content-addressed recall
store; the agent sees a skeleton plus a ⟨recall:HASH⟩ handle and pages the original back when it needs
it. Small or incompressible payloads pass through unchanged.
Two properties are enforced by the build, not asserted:
Lossless — every elision recovers byte-for-byte.
knitbrain evalsgates it on real transcripts: round-trip 100%, identifier-fidelity 100%, error and summary lines never dropped.Never-expand — the skeleton is never larger than the input.
Measured reach (run knitbrain profile for your own numbers, npm run bench for the benchmark):
Measurement | Result | Reproduce |
Average reduction over ~3M real tool-result tokens | ~46% (≈55% on blocks ≥ 400 chars) |
|
Weighted real-shape benchmark (code · logs · JSON · diffs · prose) | 68% |
|
Answer-preservation (round-trip · identifier · summary) | 100% |
|
These are the ceiling — what you save when output flows through the optimizer. How much of your
live traffic that covers depends on your setup (how compression reaches your traffic);
your realized number is the live meter (knitbrain dashboard), which counts only what actually
passed through.
2. The wiki
A small, local, per-project wiki — interlinked markdown notes and a session log, not an encyclopedia.
Most agent setups re-read and re-derive context every session; this is the opposite: knowledge is filed
once into linked pages and kept current rather than rebuilt on every query. knitbrain maintains the
bookkeeping reliably (index, cross-links, log); the depth of each page is whatever the agent writes
into it via wiki_ingest.
It lives at ~/.knitbrain/projects/<id>/wiki/:
pages/— one terse page per entity, concept, summary, or session.index.md— a catalog the agent reads first to find the right page.log.md— an append-only chronicle (## [date] event | title), which doubles as the per-session log.
Three operations, exposed as MCP tools:
ingest (
knitbrain_wiki_ingest) — write or update a page, rebuild the index, append the log, and stub any cross-referenced page.query (
knitbrain_wiki_query) — read the index and recent log to find the pages to drill into.lint (
knitbrain_wiki_lint) — flag claim contradictions across pages and orphan pages nothing links to.
On Claude Code each turn is appended to the log automatically; knitbrain_load_session surfaces recent
entries so a fresh session inherits what prior sessions did. A live dashboard panel renders the wiki and
its links.
3. The closed loop
knitbrain orchestrate <goal> drives a goal file to a verified result:
goal → judge → iterate → grade → review → repeat (until met, or a hard cycle cap)judge — is the goal clear enough to attempt?
iterate — one orchestrated pass; the work scales with project intensity (a matched skill for small tasks, the skill plus briefed agent guardrails for complex ones).
grade — a real verify command runs; exit 0 or not. A failing grade is never reported as met.
review — the result is scored against a rubric.
Every cycle is token-metered and written to the wiki as an audit trail. The loop never commits, pushes, or deploys — that stays with you. There is also a simpler outer loop for a checkbox task queue:
knitbrain loop goal.md --verify "npm test" # one worker, verify-gated
knitbrain fan goal.md --workers 4 --verify "npm test" # N workers, each in its own git worktreeA task is marked done only after verify passes (no false green), and parallel workers leave their branches for you to review.
Also included
These support the three pillars:
Per-project memory — learnings ranked by outcome (one reported wrong is discredited and sinks), an imports/exports/dependents knowledge graph, and session handoffs. Kept fresh: stale handoffs auto-clear, deleted files drop from the graph, classifier signals decay.
Skills and agents from your setup —
setupscans your existing.claude/skillsand.claude/agents, registers them (deduped), and can compose new ones in your own style (knitbrain_compose_skill,knitbrain_create_agent).Tier-routed workflow — a deterministic classifier sizes each task (inquiry → trivial → standard → complex) and routes the right depth, including plan-mode for complex work.
Live dashboard —
knitbrain dashboardshows the optimization meter, knowledge graph, wiki, and a per-agent activity feed. Zero config; auto-detects platform and plan from the MCP handshake.
How compression reaches your traffic
The optimizer is the same in both cases; what differs is reach.
API key — a loopback proxy (
knitbrain wrap <agent>) compresses every request on the wire: all tool results in the transcript, automatically.Subscription (OAuth traffic can't be intercepted, which holds for any tool in this space) — knitbrain compresses through the MCP and hook surface instead:
knitbrain_readfor files, and on Claude Code a PostToolUse hook skeletonizes Bash, Grep, Glob, and WebFetch output inline (viaupdatedToolOutput), the original kept in the recall store. No API key, no proxy.
The proxy covers everything; the hook path covers the host tools your platform lets a hook rewrite (full on Claude Code, narrower elsewhere). The dashboard meter shows your realized number either way.
Commands
Command | What it does |
| Start the MCP server on stdio — what your editor invokes. |
| Wire into your agent(s): MCP config, rules, slash commands, |
| Measure compression on your real transcripts. |
| Answer-preservation gates on your transcripts (exit 1 on failure). |
| The closed loop: judge → iterate → grade → review → repeat, verify-gated. |
| Single-worker loop over a checkbox goal file. |
| Parallel loop — N workers in isolated git worktrees. |
| Live local dashboard ( |
| Launch an agent through the optimizer proxy (API-key setups). |
| Terse-rewrite a memory file (e.g. |
| Mine past sessions for failure → success corrections. |
| Optional team hub — shared sessions over one URL and token. |
| Tokens-saved badge for your editor's status line. |
| Print the operating prompt (for non-MCP platforms). |
Guarantees (gated by tests, not promises)
Lossless — every compressed payload recovers byte-for-byte; the round-trip test gates the build.
Never-expand — output tokens ≤ input tokens, always.
Answers survive — error lines, result summaries, and top-level declarations are never elided (
knitbrain evals, 100% on real transcripts).No false green — the loop marks a task done only after a real verify passes.
Local-first — proxy, hub, and dashboard bind
127.0.0.1; nothing leaves your machine.Reproducible — every number above comes from a command you can run on your own data.
Use as a library
import { createOptimizer } from "knitbrain";
const kb = createOptimizer(); // recall store under ~/.knitbrain
const { skeleton, savedPct } = kb.compress(largeToolOutput);
// original always recoverable: kb.retrieve(handle)Development
npm run typecheck && npm run lint && npm run test && npm run build && npm run consistency && npm run bench
npm run e2e # all tools over a real stdio MCP session
node scripts/production-audit.mjs # cold-start: clone → install → pack → drive everythingAll gates pass before a commit or release.
License
MIT
Maintenance
Resources
Unclaimed servers have limited discoverability.
Looking for Admin?
If you are the server author, to access and configure the admin panel.
Latest Blog Posts
- Why MCP Servers Need Execution Sandboxing (And Why Your Current Stack Isn't Enough)By Om-Shree-0709 on .Agentic AiPrompt InjectionWebAssembly
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/PDgit12/knitbrain'
If you have feedback or need assistance with the MCP directory API, please join our Discord server