sarup
Provides optional compression modes (semantic, abstractive, pipeline) using local LLM models hosted by Ollama for enhanced context reduction.
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@sarupCompress this Thai paragraph to half its tokens."
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
ποΈ Sarup (ΰΈͺΰΈ£ΰΈΈΰΈ)
Thai-first context compression for Claude Code. Shrink the text you feed an LLM by 50β88% β while the original stays 100% recoverable, byte-for-byte.
ΰΈͺΰΈ£ΰΈΈΰΈ means "to summarize." Headroom routes Thai through
noop(0% savings) because its whitespace tokenizer can't find Thai word boundaries. Sarup uses PyThaiNLP segmentation, so it compresses Thai as well as English β and caches every original so nothing is ever lost.
Contents
Related MCP server: cctx
Highlights
πΉπ Real Thai compression β PyThaiNLP
newmmword segmentation, not whitespace.β»οΈ Lossless by guarantee β every compress caches the original;
verified: trueproves a byte-for-byte round-trip.ποΈ Five modes β from offline 1 ms TF-IDF to an 88%-savings cascade.
π§ Optional local LLM β embeddings + rewrite via Ollama, with automatic offline fallback.
βοΈ Auto mode β a PostToolUse hook compresses large tool outputs with zero manual calls.
π Honest metrics β token counts from a real tokenizer (tiktoken), not byte guesses.
π Routing β JSON compaction, log dedup, and verbatim code-fence preservation built in.
Why it's safe β the two-tier guarantee
Tier | What | Guarantee |
Compressed view | the shrunk text the model works on | lossy Β· small Β· cheap |
Retrieval store | the original, keyed by a stable hash | lossless Β· recoverable |
Aggressive lossy compression is safe because the original is always one sarup_retrieve(hash)
away. This is how "maximum savings" and "100% accuracy" coexist β they live in different tiers.
How it works
Two entry points feed one engine: a cheap compressed view the model reads, and a lossless retrieval store that can restore the original byte-for-byte.
flowchart TD
M["π§ Manual<br/>sarup_compress()"]:::entry --> R
A["βοΈ Automatic<br/>PostToolUse hook<br/>(Read Β· Bash Β· Grep)"]:::entry --> R
R{"Sarup compress<br/>extractive Β· semantic Β· abstractive Β· pipeline"}:::engine
R -- "compressed view<br/>50β88% fewer tokens" --> V["π Model context"]:::lossy
R -. "cache original" .-> S[("ποΈ Retrieval store<br/>hash β original")]:::lossless
V -. "need full detail?" .-> RET["π sarup_retrieve(hash)"]:::lossless
RET --> S
S == "byte-for-byte β" ==> V
classDef entry fill:#e0e7ff,stroke:#6366f1,color:#111
classDef engine fill:#fde68a,stroke:#d97706,color:#111
classDef lossy fill:#fef3c7,stroke:#f59e0b,color:#111
classDef lossless fill:#bbf7d0,stroke:#16a34a,color:#111Manual β the model calls
sarup_compress/sarup_retrieveitself.Automatic β the hook intercepts large tool outputs, caches the original to
SARUP_DB_PATH, and substitutes the compressed view + a retrieval hash. Source code is skipped; small outputs pass through untouched.
Tools
Tool | Purpose |
| Compress; returns compressed text, hash, token metrics, |
| Recover the original content byte-for-byte. |
| Cumulative session savings. |
sarup_compress arguments
Arg | Type | Default | Meaning |
| string | β | Text to compress (required). |
| number |
| Fraction of prose to keep (0.1β0.9). |
| boolean |
| Only apply lossless transforms (whitespace / JSON compact). |
| string |
| Relevance hint β sentences matching it are kept. |
| string |
| See modes below. |
Compression modes
Mode | How | Needs Ollama | SavingsΒΉ | SpeedΒΉ | Output |
| TF-IDF scoring + n-gram dedup | no | 50.8% | ~1 ms | verbatim subset |
| Embedding centrality + cosine dedup | yes | 64.6% | ~1β2 s | verbatim subset |
| Local-LLM rewrite | yes | ~51% | ~8β20 s | paraphrased |
| Cascade: semantic β abstractive | yes | 88.1% | ~2 s | paraphrased |
| semantic if Ollama is up, else extractive | optional | 64.6% | ~90 ms | subset |
ΒΉ Measured on a 10-sentence Thai paragraph (522 tokens). Every mode stays 100% recoverable via the store; Ollama modes degrade gracefully to extractive when the backend is down.
Measured results
$ .\.venv\Scripts\python.exe bench\benchmark.py
sample before after savings verify
Thai prose 522 257 50.8% OK
Thai prose (aggressive) 522 217 58.4% OK
English prose 105 54 48.6% OK
JSON (lossless) 67 44 34.3% OK
Logs 563 300 46.7% OK
TOTAL 1779 872 51.0% ALL OK β 100% recoverable
Mode comparison (Thai prose, 522 tok):
extractive 50.8% (1ms) Β· auto 64.6% (~90ms) Β· semantic 64.6% (2.1s)
abstractive 51.1% (8s) Β· pipeline 88.1% (2.3s) β all verified recoverableToken counts via tiktoken cl100k_base β a real tokenizer, not a byte heuristic.
Install
py -3.11 -m venv .venv
.\.venv\Scripts\python.exe -m pip install -e ".[dev]"Optional local-LLM modes (semantic / abstractive / pipeline) need Ollama:
ollama pull nomic-embed-text # embeddings β semantic mode
ollama pull gemma3:12b # rewrite β abstractive / pipeline (Thai-validated)Register with Claude Code
One-command setup (recommended). Detects this machine's paths, probes Ollama
(picks the best mode + models), and merges into .mcp.json / .claude/settings.json
without clobbering anything already there (a .bak is written first):
.\.venv\Scripts\python.exe scripts\install.py --with-hook --pullNo Ollama? It configures offline
extractivemode β still fully works.Ollama up? It auto-selects
nomic-embed-text(semantic) +gemma3:12b(rewrite) and sets the hook toauto.--pullfetches any missing models.Idempotent β safe to re-run;
--globalwrites to~/.claudeinstead.
Manual β or add it yourself to your MCP config (e.g. .mcp.json or ~/.claude.json):
{
"mcpServers": {
"sarup": {
"command": "d:\\WORK\\Sarup\\.venv\\Scripts\\python.exe",
"args": ["-m", "sarup.server"],
"env": { "SARUP_DB_PATH": "d:\\WORK\\Sarup\\.sarup-cache.db" }
}
}
}Or run it directly over stdio:
.\.venv\Scripts\python.exe -m sarup.serverAuto-compression hook
Skip manual tool calls entirely: install the PostToolUse hook and large Read/Bash/Grep
outputs are compressed before they enter context, with the original cached for retrieval.
Source-code reads are skipped for safety. Full setup in hooks/README.md.
Experimental: the hook emits a valid
updatedToolOutput, but current Claude Code builds (2.1.167 / 2.1.193) don't yet apply it, so it's a no-op in-session today. Use the manualsarup_compresstool meanwhile β the hook is ready for when Claude Code honors the field.
{
"hooks": {
"PostToolUse": [
{ "matcher": "Read|Bash|Grep",
"hooks": [{ "type": "command",
"command": "d:\\WORK\\Sarup\\.venv\\Scripts\\python.exe d:\\WORK\\Sarup\\hooks\\sarup_hook.py" }] }
]
},
"env": { "SARUP_DB_PATH": "d:\\WORK\\Sarup\\.sarup-cache.db" }
}Privacy & data
To guarantee recovery, Sarup caches the original content in the store. Two things to know:
With
SARUP_DB_PATHset, originals are written to that SQLite file in plaintext (no encryption). Treat it like a cache of whatever you compressed.If you compress tool outputs that contain secrets (e.g. a
.envdump or credentials in a log), those land in the cache too. The auto-hook skips source-code/config file reads, butBashoutput is fair game β review what you point it at.
*.db is git-ignored, so the cache never gets committed. For zero on-disk
footprint, leave SARUP_DB_PATH unset (memory-only; the MCP server then loses
the cache on restart, and the hook will not substitute β see the hook docs).
Configuration
Var | Default | Meaning |
| (in-memory) | SQLite path for a persistent, cross-process store. Required for hook retrieval. |
|
| Ollama endpoint. |
|
| Model for abstractive / pipeline rewrite. |
|
| Model for semantic embeddings. |
|
| Hook compression mode. |
|
| Hook only compresses outputs larger than this. |
Project structure
sarup/
βββ src/sarup/
β βββ server.py # MCP stdio server β 3 tools
β βββ compressor.py # router + modes (extractive/semantic/abstractive/pipeline/auto)
β βββ thai.py # PyThaiNLP tokenization, sentence split, TF-IDF
β βββ semantic.py # embedding centrality + cosine dedup
β βββ llm.py # optional Ollama backend (generate + embed)
β βββ tokens.py # real token counting (tiktoken)
β βββ store.py # CCR store: hash β original (memory + SQLite)
βββ hooks/
β βββ sarup_hook.py # PostToolUse auto-compression hook
β βββ README.md # hook install guide
βββ bench/benchmark.py # before/after measurement
βββ tests/ # 50 tests (test_thai, test_mcp, test_hook)
βββ README.md
βββ STACK.md # full stack + techniquesTech stack & techniques
Python 3.11 Β· MCP Β· PyThaiNLP newmm Β· tiktoken Β· Ollama (optional) Β· SQLite Β· hatchling Β· pytest.
The technique behind each mode β TF-IDF scoring, embedding centrality, cascade pipeline, content routing, and graceful degradation β is documented in STACK.md.
Testing
.\.venv\Scripts\python.exe -m pytest tests/ -q50 tests cover Thai NLP, the MCP tool contracts, every mode (including Ollama-fallback paths), the roundtrip-verify guarantee, and the auto-compression hook (incl. cross-process retrieval).
Roadmap
Make
autothe default mode forsarup_compress(currentlyextractive).Optional Typhoon 2.1 abstractive (blocked on an Ollama template fix).
Per-content adaptive
target_ratio.Published PyPI package.
License
MIT
Maintenance
Resources
Unclaimed servers have limited discoverability.
Looking for Admin?
If you are the server author, to access and configure the admin panel.
Latest Blog Posts
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/PHUICMT/sarup'
If you have feedback or need assistance with the MCP directory API, please join our Discord server