Leptin
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@Leptinremember that we use FastAPI and SQLAlchemy for the backend"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
🧬 Leptin
Token-budgeted MCP memory for AI coding agents.
The satiety hormone for agent memory: a drop-in MCP server that puts your AI agent's long-term memory on a token budget, shows you the receipts, and guarantees it never silently forgot anything that mattered.
Quickstart · Why I built this · How it works · Benchmark · Self-tuning · Security
Persistent-memory MCP servers fixed "the agent forgets between sessions." But for many setups they introduce a quieter problem: the memory store can inflate every prompt and bill you for it. As the store grows, a top-k recall tends to inject more matched memories into context — eating the very window it was meant to protect — and it's often hard to see what that's costing. Turning on forgetting helps, but a plain decay scanner can drop a fact you'll want next week with no easy way to check or undo it.
Plenty of good tools address pieces of this. Leptin's bet is to put the whole loop — diet + scale + safety net — into one lean, local sidecar. Like the hormone it's named after, it tells your memory store when it's had enough, so it stops hoarding.
A memory layer can quietly grow forever and bill you in the dark,
or forget things and hope you don't notice.
Leptin puts your memory on a budget, shows you the receipts,
and proves it didn't forget anything that mattered.Why I built this
(This is the project's origin story. If you're reading the code, this is the "why".)
I build with AI coding agents every day. Like everyone, I got tired of my agent forgetting everything between sessions — the stack, my preferences, the decisions we'd already made — so I bolted on a persistent-memory MCP server. It worked. For about a week.
Then sessions started getting slower and more expensive. As the store grew, recall injected a bigger pile of matched memories into the context window on most turns. A top-k recall of ~10 memories at a few hundred tokens each adds up to thousands of tokens per query — paid over and over, brushing against context limits, reaching for /compact. The thing I added to help my agent was now quietly competing with it for context, and I didn't have an easy way to see what it was costing me.
The agent-memory ecosystem is genuinely good and moving fast, and several tools already tackle parts of this well — some add token controls, some ship dashboards, some prune old memories. I'm not claiming nobody has solved it. What I couldn't find was one drop-in piece that combined the specific things I wanted, for the way I actually work: a solo dev running coding agents all day against a local store I'd rather not migrate off of.
The combination I wanted:
Keep memory lean automatically (dedup, merge, decay).
Show me the bill — tokens and dollars, on my own data, not a benchmark slide.
Forget safely — never silently lose something I'd need, and let me undo anything.
So I built Leptin to scratch that itch — a focused, local-first, zero-dependency take aimed at people whose setup looks like mine. It's not trying to replace the bigger memory platforms; it's the lean sidecar I wanted. I cleaned it up, wrote tests, made the headline reproducible with one command, and published it in case it helps you too.
— @lionellau. PRs, issues, and "this saved me X tokens" stories all welcome.
Related MCP server: Linksee Memory
The headline (reproduce it yourself)
leptin bench Leptin benchmark — naive top-k store vs. Leptin (offline, deterministic)
----------------------------------------------------------------
corpus : 49 inserts, 24 probes
active memories : naive 47 leptin 39 (dedup kept 8 out)
recall budget : 1500 tokens | naive dumps top-10
----------------------------------------------------------------
memory tokens : naive 3396 leptin 1147
TOKEN REDUCTION : 66.2% (target ≥ 60%)
recall : naive 1.000 leptin 1.000
RECALL LOSS : 0.0% (target ≤ 2%)
est. $ saved : $0.006966 (priced at claude-sonnet-4-6)
----------------------------------------------------------------
HEADLINE : PASS ✅ ≥60% fewer memory tokens at ≤2% recall loss≥60% fewer memory tokens at ≤2% recall loss — runs fully offline, no API key, deterministic. The corpus, prompts, and models are pinned in code so the number is the same on your machine as on ours.
The baseline is a naive top-k dump — exactly what stock persistent-memory MCP servers do today. That's the real status quo Leptin competes against, not a strawman.
Savings come from two real mechanisms: mostly budgeted, relevance-packed recall, plus write-time dedup/merge. The output shows the dedup contribution separately (
dedup kept N out).The corpus is synthetic and illustrative — a bundled, deterministic LoCoMo-style set. To measure your own numbers on real LoCoMo with hosted embeddings:
leptin bench --dataset locomo.json --embedding-model text-embedding-3-small.
Quickstart
1. Install
pip install leptin-mcp # once published to PyPI
uvx leptin-mcp serve # zero-install run (uv)
# from source today:
pip install "git+https://github.com/lionellau/leptin"
# optional: hosted embeddings + LLM merge (OpenAI / Voyage / Claude)
pip install "leptin-mcp[hosted]"2. Connect it to Claude Code / Codex
leptin initThat prints a ready-to-paste MCP config block:
{
"mcpServers": {
"leptin": { "command": "leptin", "args": ["serve", "--db", "~/.leptin/memory.db"] }
}
}Restart the client. The agent now has 8 memory tools. Ask it to "remember that I prefer dark mode", then later "what are my preferences?" — and run leptin report to see the tokens and dollars saved.
Savings show up once your store has overlap (so dedup fires) or recall hits the budget. On a brand-new store,
reporthonestly says it hasn't saved anything yet.
3. See the receipts
leptin dashboard # local savings dashboard at http://127.0.0.1:8765
leptin doctor # health check: store, schema, models, hosted readiness
leptin report # or print the ledger as JSONHow it works
Five mechanisms, all behind the MCP interface:
Write-time dedup / merge. On
remember, near-duplicates within a subject are merged into one canonical memory; contradictions supersede the older fact. The store stops accumulating restatements.Time-decay forgetting. Each memory has a
strengththat decays exponentially (Ebbinghaus-style, configurable half-life) and is boosted on access. Weak, unused memories become prune-eligible.Budgeted, packed recall. Candidates ranked by
similarity × strength, then greedy-packed under a hard token budget with a relevance gate — so off-topic padding never makes the cut.Savings ledger. Every op logs
baseline_tokens(what a naive store would have injected) vs.actual_tokens, converted to $ via a per-model price table.Recall guardrail. Before any prune commits, a probe set (
question → expected_fact) is re-run against the post-diet store inside a transaction; if recall would drop past a threshold, the whole prune is rolled back.
The theory, and why it matters for your projects
Context windows are a budget, and memory spends it silently. Every token a memory layer injects is a token your agent can't use for code, and a token you pay for on every turn. The naive design — embed the query, return the top-k matches, inject all of them — has a brutal failure mode: as the store grows, "top-k" is drawn from an ever-larger pool, so the matches get bigger and less precise, and you re-pay for them on every recall.
Concretely, the kind of thing that bites real projects:
A months-long coding project. By month three your agent has "remembered" hundreds of overlapping facts about the codebase. A single "how does auth work?" recall now injects 2–3k tokens of half-relevant history every turn. Multiply by hundreds of turns a day. Leptin's dedup collapses the restatements and its budgeted recall injects only the on-topic few — the cost stops growing with the store.
Preferences and decisions that change. You said "use pnpm" in week one and "actually, use bun" in week six. A naive store now holds both and may inject the stale one. Leptin's supersede keeps the newer fact active and the older one auditable-but-out-of-context.
Multi-agent / multi-session setups. Several agents hammering one memory store re-inject the same boilerplate constantly. Dedup + a token ceiling caps the blast radius.
"Just turn on forgetting." Decay alone is dangerous — the fact you query once a month is exactly the one a dumb decay scanner deletes. Leptin only prunes behind the recall guardrail, and quarantines (never hard-deletes) within a retention window, so forgetting is safe and reversible.
The decay model is the classic Ebbinghaus forgetting curve (strength(t) = strength₀ · e^(−λt), reinforced on access) — the same spacing-and-recency intuition human memory uses, applied to keep the useful facts strong and let genuinely-cold ones fade. The budgeted packer is a greedy knapsack on relevance-per-token. The recall guardrail is the piece I most wanted and rarely saw: it turns "forgetting" from a leap of faith into a checked, reversible operation.
🧬 Self-tuning — Leptin learns its own diet
Leptin doesn't just measure itself — it evolves. The self-tuning loop replays your own data under candidate policies and commits a change only when held-out evals prove it's a net win (more savings, no recall loss), else it leaves the config alone. Same trust DNA as the guardrail, applied to the policy itself.
leptin tune --dry-run # preview the proposed change
leptin tune # apply it (only if it's a proven net win)
leptin tune --history # the evolution ledger
leptin tune --rollback # undo the last change, exactlyHeld-out gate + dual-metric accept — no overfitting the eval, no recall regressions.
Locked safety rails — the optimizer can tune recall/decay knobs but can never touch
guardrail_max_drop.Reversible — every change is an evolution-ledger row; roll back to any prior config.
Token/context efficient by construction — read-only evals on a bounded sample, zero LLM calls offline, cadence-triggered, tiny scorecard output. Opt-in (
self_tune_enabled); manualleptin tunealways works.
The 8 MCP tools
Tool | What it does |
| Store a fact. Write-time dedup/merge; contradictions supersede the older fact (kept, not deleted). |
| Retrieve under a token budget — packed for relevance, with |
| Guardrailed decay-prune + merge + supersede. Auto-rolls-back any prune that hurts recall. |
| Soft-delete by id or query → quarantine (reversible), never a hard delete. |
| Bring a forgotten/quarantined memory back. |
| Full provenance, current strength, and event history for any memory. |
| The "show me the receipts" tool: tokens & $ saved, op breakdown, guardrail status. |
| Self-evolve the memory policy — commit only on a proven net win, else revert. Offline, zero LLM calls. |
Where Leptin fits
Other memory tools are good at what they do — this isn't a teardown, it's about which combination Leptin focuses on. Compared to the common approaches (not any one product):
Top-k memory store | Hosted memory platform | Decay-based "forgetting" | Leptin | |
Persistent memory across sessions | ✅ | ✅ | ✅ | ✅ |
Hard token budget on recall | usually no | sometimes | usually no | ✅ |
Savings ledger (tokens & $ on your data) | rare | rare | rare | ✅ |
Forgetting with a recall guardrail + rollback | n/a | rare | rare | ✅ |
Self-tuning policy | no | no | no | ✅ |
Local sidecar, no migration, zero infra | partial | no | varies | ✅ |
Runs fully offline, zero deps | varies | no | varies | ✅ |
If you need a full managed memory platform, use one. Leptin is the lean, local, auditable sidecar for when you don't.
Design
Zero core dependencies. The engine, MCP server, ledger, guardrail, dashboard, benchmark, and self-tuner run on the Python standard library alone.
pip installis instant;uvx leptin-mcpjust works.Offline by default, hosted by upgrade. Default embedder is a deterministic hashing vectorizer; merges are heuristic — so everything (including the benchmark) runs with no API key and is reproducible. Install
leptin-mcp[hosted]for real OpenAI/Voyage embeddings + Claude/GPT merging.Graceful degradation. If the embedding/LLM API is unreachable,
remember/recallretry then fall back to local — they never throw to the agent.Glass box, reversible. Every merge/decay/forget/tune is logged with a reason; nothing is hard-deleted within the retention window.
⚠️ Offline-mode caveat: the default hashing embedder merges near-lexical duplicates well, but not deep paraphrases ("dark mode" vs "night theme"). For semantic dedup, configure hosted embeddings. The conservative defaults err toward keeping data — consistent with "never silently forget."
Running it in production
Capability | What it gives you |
| One-command health check (store, schema version, models, hosted SDK/key readiness). Non-zero exit if unhealthy. |
Schema migrations | Versioned on-disk schema; older stores upgrade in place, data preserved. |
Concurrency | WAL + |
Scale | Parsed-embedding cache keeps recall in the low-ms over thousands of memories. |
Hardened hosted mode | Retries transient API errors with backoff before degrading; caches embeddings to avoid re-billing. |
Structured logging |
|
Configuration
Every tunable has a sane default (env LEPTIN_*, the config table, or a Config object):
Key | Default | Meaning |
|
| Hard token ceiling per recall |
|
| Cosine τ for near-duplicate merge |
|
| Strength halving time |
|
| Max tolerated recall drop before rollback |
|
| or |
|
| or |
|
| Run the self-tuning loop automatically after compaction |
Security
Leptin is local-first and designed to be safe by default:
The MCP server speaks JSON-RPC over stdio — no network listener.
The dashboard binds to 127.0.0.1 only and rejects non-localhost
Hostheaders (DNS-rebinding mitigation). It has no auth and is for single-user local use — don't expose it.Memory content is treated as data, never executed.
Hosted embedding/LLM calls (opt-in
[hosted]) send memory text to the configured provider — review their data policy first. API keys are read from env vars, never stored.A user's memory database is never committable (
.gitignoreexcludes*.db/*.sqlite).
Found a vulnerability? See SECURITY.md — please report privately.
Testing
uv venv && uv pip install -e ".[dev]" && pytest112 tests cover the PRD acceptance criteria: budget guarantees, savings-ledger math, dedup/merge/supersede, decay, the guardrail rollback/commit invariants, self-tuning (offline zero-cost, lock enforcement, reversibility, determinism), glass-box reversibility, the MCP protocol surface (incl. a real leptin serve subprocess), the dashboard HTTP layer, hosted integration + retry/degradation paths, schema migrations, concurrent writers, recall latency at scale, and the reproducible benchmark. CI runs the suite, the benchmark, a clean wheel install, and the TS build on Python 3.10–3.13.
FAQ
Does it work with anything other than Claude Code? Yes — it's a standard MCP server (stdio), so any MCP client (Codex, Cursor, etc.) works. There's also a @leptin/client TypeScript SDK and a Python API (from leptin.api import Leptin).
Do I need an API key? No. The default mode is fully offline and deterministic. Hosted embeddings/LLM are an opt-in upgrade for semantic dedup.
Will it delete something I need? Not silently. Decay only prunes behind the recall guardrail, prunes are quarantined (not hard-deleted) and restorable, and you can add probes for anything you want protected.
Is the 66% number real? It's reproducible offline on a bundled synthetic corpus (leptin bench), and the harness runs on real LoCoMo data too (--dataset). See the benchmark note.
Where does my data live? In a local SQLite file (default ~/.leptin/memory.db). Zero infra. Adapters for Mem0/pgvector are on the roadmap.
Roadmap
Shipped in v1.0 — drop-in MCP server + 8 tools · SQLite backend · dedup/merge/supersede · decay · token-budgeted packed recall · savings ledger · recall guardrail + reversibility · 🧬 self-tuning · leptin doctor · schema migrations · reproducible leptin bench (+ real LoCoMo) · local dashboard · TS SDK · 112 tests + CI.
Forward roadmap — backend adapters (Mem0, pgvector) so Leptin diets a store you already run · sqlite-vec fast path · hosted prompt/intent optimization for self-tuning · shared/team memory.
Contributing
PRs welcome — especially backend adapters. See CONTRIBUTING.md and the Code of Conduct. Keep the core dependency-free, add a test, and don't weaken the guardrail.
License
MIT — see LICENSE.
This server cannot be installed
Maintenance
Resources
Unclaimed servers have limited discoverability.
Looking for Admin?
If you are the server author, to access and configure the admin panel.
Latest Blog Posts
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/lionellau/leptin'
If you have feedback or need assistance with the MCP directory API, please join our Discord server