winnow
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@winnowcompress this text and show savings"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
winnow
Local-first context compression for AI agents. Keep the signal, winnow the chaff.
Agents burn tokens on fat tool outputs — JSON dumps, logs, file reads, RAG chunks, conversation history. winnow compresses that text before it reaches the model, cutting tokens by 40–95% while keeping what matters. It's content-aware, reversible (originals are recoverable on demand), and the core has zero runtime dependencies. Everything runs on your machine — no proxy, no API key, no egress.
your agent / app → winnow (local) → LLM providerWhy
Compression that silently drops the wrong line is worse than no compression. winnow is built around three ideas:
Content-aware, lossy-but-reversible. Different compressors for JSON, logs, code, and binary. Every original is stashed locally under a content id, so the model can retrieve the full text the moment it needs detail. Lossy inline, lossless on demand.
Delivery is backbone-gated. How a large result is delivered changes accuracy as much as how well it's compressed. Strong models get a short preview + a retrievable pointer; small/distilled models get a larger inline window and are never handed a pointer they won't follow.
Cache-aligned. A volatile segment (a timestamp, "current" state) early in your prompt invalidates the provider's KV cache every turn.
winnowaligns a tiered prompt so the stable prefix leads and the cache survives.
Related MCP server: mcp-agent-opt
Install
npm install winnowNode ≥ 18, ESM. Core has no runtime deps. Code (AST) compression uses an optional typescript peer.
Quickstart
import { compress, retrieve, stats } from "winnow";
const huge = JSON.stringify(await fetchManyRows()); // e.g. 200 similar objects
const r = await compress(huge);
console.log(r.text); // head+tail sample, middle elided, + a retrieval footer
console.log(r.compressed); // true
console.log(stats(huge, r.text)); // { tokensBefore, tokensAfter, tokensSaved, ratio }
// later, if the model needs the full thing:
const original = await retrieve(r.originalId!);Compress a whole chat array:
import { compressMessages } from "winnow";
const slim = await compressMessages(messages); // compresses each message's contentBenchmark — measured, not claimed
winnow bench runs a fidelity harness: for each case it records token savings and checks whether the "needle" (the fact a model would need) survives compression inline. Anything elided is still recoverable from the store, so recoverable fidelity is 100% by construction — this measures the harder number, what survives without a retrieval round-trip.
winnow fidelity — 6 cases
json-head json save 86% inline ✓
json-tail json save 86% inline ✓
json-middle json save 86% inline · (recoverable)
log-error logs save 99% inline ✓
log-dupes logs save 99% inline ✓
text-prose text save 0% inline ✓
avg savings: 76% inline needle survival: 83%
by position: head 100% · tail 100% · middle 0% · anywhere 100%
recoverable fidelity: 100% (every elided original is retrievable from the store)The honest tradeoff is visible: a needle buried deep in the middle of a 200-row array is elided inline — and recoverable in one retrieve call. Logs and head/tail JSON keep their signal at a fraction of the tokens.
API
Export | What it does |
| Reversible compress of one block; returns |
| Compress each |
| Read a stored original back by id. |
| Token savings + ratio. |
| Pure router (no I/O, no stashing). |
| Individual compressors. |
| TOON — lossless object-array ↔ table (keeps every row). |
| Collapse repeated blocks/messages anywhere; reversible. |
| Anchored history compaction (injected summarizer, extractive fallback). |
| LLMLingua-style score-and-drop; inject your own scorer, heuristic fallback. |
| Token counting — exact with an injected encoder. |
| Pick compression options that maximize measured survival × savings. |
| Size-based offload with the backbone-gated delivery policy. |
| The delivery policy primitives. |
| Cache-align a tiered prompt; returns the prompt, stable-prefix |
CompressOptions: minTokens (default 400), headItems (3), tailItems (1), maxStringLength (200).
Cache alignment
import { alignSegments, cacheHolds } from "winnow";
const aligned = alignSegments([
{ id: "system", text: SYSTEM, stable: true },
{ id: "tools", text: TOOLS, stable: true },
{ id: "clock", text: now(), stable: false }, // moved after the stable prefix
]);
aligned.prompt; // stable segments first → cacheable prefix
aligned.cacheKey; // equal across turns ⇒ the KV cache can hit
cacheHolds(lastKey, aligned); // did the cached prefix survive this turn?CLI
winnow bench # fidelity benchmark (savings + needle survival)
cat big.json | winnow compress # compress stdin → stdout (stats on stderr)
winnow retrieve <id> # print a stored original
winnow mcp # start the MCP server (stdio)MCP server
Expose winnow to any MCP client (editors, agent runtimes) as three tools — winnow_compress, winnow_retrieve, winnow_stats:
winnow mcp// in your client's MCP config
{ "mcpServers": { "winnow": { "command": "winnow", "args": ["mcp"] } } }Design notes
Lossy inline, lossless on demand. Compression always shrinks; the original is one
retrieveaway. The compressor never keeps a result that didn't actually shrink.Read-fidelity is a contract. Precision matters most for code and exact reads — code compression keeps every signature/type/import and only elides bodies (recoverable), so the model still sees the shape.
Local-first. Originals live in
.winnow/ccr/(override withWINNOW_DIR). Nothing leaves your machine.Token counts default to a
length/4heuristic; swap in a real tokenizer where exact numbers matter.
License
MIT © Jason Poindexter
This server cannot be installed
Maintenance
Resources
Unclaimed servers have limited discoverability.
Looking for Admin?
If you are the server author, to access and configure the admin panel.
Latest Blog Posts
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/jpoindexter/winnow'
If you have feedback or need assistance with the MCP directory API, please join our Discord server