Skip to main content
Glama

πŸ—œοΈ Sarup (ΰΈͺΰΈ£ΰΈΈΰΈ›)

Thai-first context compression for Claude Code. Shrink the text you feed an LLM by 50–88% β€” while the original stays 100% recoverable, byte-for-byte.

Python MCP Tests Offline License


ΰΈͺΰΈ£ΰΈΈΰΈ› means "to summarize." Headroom routes Thai through noop (0% savings) because its whitespace tokenizer can't find Thai word boundaries. Sarup uses PyThaiNLP segmentation, so it compresses Thai as well as English β€” and caches every original so nothing is ever lost.

Contents

Related MCP server: cctx

Highlights

  • πŸ‡ΉπŸ‡­ Real Thai compression β€” PyThaiNLP newmm word segmentation, not whitespace.

  • ♻️ Lossless by guarantee β€” every compress caches the original; verified: true proves a byte-for-byte round-trip.

  • 🎚️ Five modes β€” from offline 1 ms TF-IDF to an 88%-savings cascade.

  • 🧠 Optional local LLM β€” embeddings + rewrite via Ollama, with automatic offline fallback.

  • βš™οΈ Auto mode β€” a PostToolUse hook compresses large tool outputs with zero manual calls.

  • πŸ“ Honest metrics β€” token counts from a real tokenizer (tiktoken), not byte guesses.

  • πŸ”Œ Routing β€” JSON compaction, log dedup, and verbatim code-fence preservation built in.

Why it's safe β€” the two-tier guarantee

Tier

What

Guarantee

Compressed view

the shrunk text the model works on

lossy Β· small Β· cheap

Retrieval store

the original, keyed by a stable hash

lossless Β· recoverable

Aggressive lossy compression is safe because the original is always one sarup_retrieve(hash) away. This is how "maximum savings" and "100% accuracy" coexist β€” they live in different tiers.

How it works

Two entry points feed one engine: a cheap compressed view the model reads, and a lossless retrieval store that can restore the original byte-for-byte.

flowchart TD
    M["πŸ§‘ Manual<br/>sarup_compress()"]:::entry --> R
    A["βš™οΈ Automatic<br/>PostToolUse hook<br/>(Read Β· Bash Β· Grep)"]:::entry --> R

    R{"Sarup compress<br/>extractive Β· semantic Β· abstractive Β· pipeline"}:::engine
    R -- "compressed view<br/>50–88% fewer tokens" --> V["πŸ“„ Model context"]:::lossy
    R -. "cache original" .-> S[("πŸ—„οΈ Retrieval store<br/>hash β†’ original")]:::lossless

    V -. "need full detail?" .-> RET["πŸ”‘ sarup_retrieve(hash)"]:::lossless
    RET --> S
    S == "byte-for-byte βœ“" ==> V

    classDef entry fill:#e0e7ff,stroke:#6366f1,color:#111
    classDef engine fill:#fde68a,stroke:#d97706,color:#111
    classDef lossy fill:#fef3c7,stroke:#f59e0b,color:#111
    classDef lossless fill:#bbf7d0,stroke:#16a34a,color:#111
  • Manual β€” the model calls sarup_compress / sarup_retrieve itself.

  • Automatic β€” the hook intercepts large tool outputs, caches the original to SARUP_DB_PATH, and substitutes the compressed view + a retrieval hash. Source code is skipped; small outputs pass through untouched.

Tools

Tool

Purpose

sarup_compress(content, target_ratio?, lossless?, query?, mode?)

Compress; returns compressed text, hash, token metrics, verified, token_method.

sarup_retrieve(hash)

Recover the original content byte-for-byte.

sarup_stats()

Cumulative session savings.

sarup_compress arguments

Arg

Type

Default

Meaning

content

string

β€”

Text to compress (required).

target_ratio

number

0.5

Fraction of prose to keep (0.1–0.9).

lossless

boolean

false

Only apply lossless transforms (whitespace / JSON compact).

query

string

""

Relevance hint β€” sentences matching it are kept.

mode

string

extractive

See modes below.

Compression modes

Mode

How

Needs Ollama

SavingsΒΉ

SpeedΒΉ

Output

extractive (default)

TF-IDF scoring + n-gram dedup

no

50.8%

~1 ms

verbatim subset

semantic

Embedding centrality + cosine dedup

yes

64.6%

~1–2 s

verbatim subset

abstractive

Local-LLM rewrite

yes

~51%

~8–20 s

paraphrased

pipeline

Cascade: semantic β†’ abstractive

yes

88.1%

~2 s

paraphrased

auto

semantic if Ollama is up, else extractive

optional

64.6%

~90 ms

subset

ΒΉ Measured on a 10-sentence Thai paragraph (522 tokens). Every mode stays 100% recoverable via the store; Ollama modes degrade gracefully to extractive when the backend is down.

Measured results

$ .\.venv\Scripts\python.exe bench\benchmark.py

sample                      before   after   savings   verify
Thai prose                     522     257    50.8%       OK
Thai prose (aggressive)        522     217    58.4%       OK
English prose                  105      54    48.6%       OK
JSON (lossless)                 67      44    34.3%       OK
Logs                           563     300    46.7%       OK
TOTAL                         1779     872    51.0%    ALL OK   β†’ 100% recoverable

Mode comparison (Thai prose, 522 tok):
  extractive 50.8% (1ms) Β· auto 64.6% (~90ms) Β· semantic 64.6% (2.1s)
  abstractive 51.1% (8s) Β· pipeline 88.1% (2.3s)        ← all verified recoverable

Token counts via tiktoken cl100k_base β€” a real tokenizer, not a byte heuristic.

Install

py -3.11 -m venv .venv
.\.venv\Scripts\python.exe -m pip install -e ".[dev]"

Optional local-LLM modes (semantic / abstractive / pipeline) need Ollama:

ollama pull nomic-embed-text     # embeddings β†’ semantic mode
ollama pull gemma3:12b           # rewrite β†’ abstractive / pipeline (Thai-validated)

Register with Claude Code

One-command setup (recommended). Detects this machine's paths, probes Ollama (picks the best mode + models), and merges into .mcp.json / .claude/settings.json without clobbering anything already there (a .bak is written first):

.\.venv\Scripts\python.exe scripts\install.py --with-hook --pull
  • No Ollama? It configures offline extractive mode β€” still fully works.

  • Ollama up? It auto-selects nomic-embed-text (semantic) + gemma3:12b (rewrite) and sets the hook to auto. --pull fetches any missing models.

  • Idempotent β€” safe to re-run; --global writes to ~/.claude instead.

Manual β€” or add it yourself to your MCP config (e.g. .mcp.json or ~/.claude.json):

{
  "mcpServers": {
    "sarup": {
      "command": "d:\\WORK\\Sarup\\.venv\\Scripts\\python.exe",
      "args": ["-m", "sarup.server"],
      "env": { "SARUP_DB_PATH": "d:\\WORK\\Sarup\\.sarup-cache.db" }
    }
  }
}

Or run it directly over stdio:

.\.venv\Scripts\python.exe -m sarup.server

Auto-compression hook

Skip manual tool calls entirely: install the PostToolUse hook and large Read/Bash/Grep outputs are compressed before they enter context, with the original cached for retrieval. Source-code reads are skipped for safety. Full setup in hooks/README.md.

Experimental: the hook emits a valid updatedToolOutput, but current Claude Code builds (2.1.167 / 2.1.193) don't yet apply it, so it's a no-op in-session today. Use the manual sarup_compress tool meanwhile β€” the hook is ready for when Claude Code honors the field.

{
  "hooks": {
    "PostToolUse": [
      { "matcher": "Read|Bash|Grep",
        "hooks": [{ "type": "command",
          "command": "d:\\WORK\\Sarup\\.venv\\Scripts\\python.exe d:\\WORK\\Sarup\\hooks\\sarup_hook.py" }] }
    ]
  },
  "env": { "SARUP_DB_PATH": "d:\\WORK\\Sarup\\.sarup-cache.db" }
}

Privacy & data

To guarantee recovery, Sarup caches the original content in the store. Two things to know:

  • With SARUP_DB_PATH set, originals are written to that SQLite file in plaintext (no encryption). Treat it like a cache of whatever you compressed.

  • If you compress tool outputs that contain secrets (e.g. a .env dump or credentials in a log), those land in the cache too. The auto-hook skips source-code/config file reads, but Bash output is fair game β€” review what you point it at.

*.db is git-ignored, so the cache never gets committed. For zero on-disk footprint, leave SARUP_DB_PATH unset (memory-only; the MCP server then loses the cache on restart, and the hook will not substitute β€” see the hook docs).

Configuration

Var

Default

Meaning

SARUP_DB_PATH

(in-memory)

SQLite path for a persistent, cross-process store. Required for hook retrieval.

OLLAMA_HOST

http://localhost:11434

Ollama endpoint.

SARUP_ABSTRACTIVE_MODEL

gemma3:12b

Model for abstractive / pipeline rewrite.

SARUP_EMBED_MODEL

nomic-embed-text

Model for semantic embeddings.

SARUP_HOOK_MODE

auto

Hook compression mode.

SARUP_HOOK_MIN_CHARS

4000

Hook only compresses outputs larger than this.

Project structure

sarup/
β”œβ”€β”€ src/sarup/
β”‚   β”œβ”€β”€ server.py       # MCP stdio server β€” 3 tools
β”‚   β”œβ”€β”€ compressor.py   # router + modes (extractive/semantic/abstractive/pipeline/auto)
β”‚   β”œβ”€β”€ thai.py         # PyThaiNLP tokenization, sentence split, TF-IDF
β”‚   β”œβ”€β”€ semantic.py     # embedding centrality + cosine dedup
β”‚   β”œβ”€β”€ llm.py          # optional Ollama backend (generate + embed)
β”‚   β”œβ”€β”€ tokens.py       # real token counting (tiktoken)
β”‚   └── store.py        # CCR store: hash β†’ original (memory + SQLite)
β”œβ”€β”€ hooks/
β”‚   β”œβ”€β”€ sarup_hook.py   # PostToolUse auto-compression hook
β”‚   └── README.md       # hook install guide
β”œβ”€β”€ bench/benchmark.py  # before/after measurement
β”œβ”€β”€ tests/              # 50 tests (test_thai, test_mcp, test_hook)
β”œβ”€β”€ README.md
└── STACK.md            # full stack + techniques

Tech stack & techniques

Python 3.11 Β· MCP Β· PyThaiNLP newmm Β· tiktoken Β· Ollama (optional) Β· SQLite Β· hatchling Β· pytest.

The technique behind each mode β€” TF-IDF scoring, embedding centrality, cascade pipeline, content routing, and graceful degradation β€” is documented in STACK.md.

Testing

.\.venv\Scripts\python.exe -m pytest tests/ -q

50 tests cover Thai NLP, the MCP tool contracts, every mode (including Ollama-fallback paths), the roundtrip-verify guarantee, and the auto-compression hook (incl. cross-process retrieval).

Roadmap

  • Make auto the default mode for sarup_compress (currently extractive).

  • Optional Typhoon 2.1 abstractive (blocked on an Ollama template fix).

  • Per-content adaptive target_ratio.

  • Published PyPI package.

License

MIT

Install Server
A
license - permissive license
A
quality
B
maintenance

Maintenance

–Maintainers
–Response time
–Release cycle
–Releases (12mo)
Commit activity

Resources

Unclaimed servers have limited discoverability.

Looking for Admin?

If you are the server author, to access and configure the admin panel.

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/PHUICMT/sarup'

If you have feedback or need assistance with the MCP directory API, please join our Discord server