What can you do with this server?

Sarup is a Thai-first context compression MCP server that shrinks text (50–88% fewer tokens) while guaranteeing full recovery of the original content via a retrieval store. * Compress text (sarup_compress): Reduce token size of Thai prose, English prose, JSON, and logs using one of five modes: * extractive (default): Offline TF-IDF scoring, ~1ms, no external dependencies * semantic: Embedding-based centrality scoring (~64.6% savings), requires Ollama * abstractive: Local-LLM rewrite via Ollama (~51% savings, slower) * pipeline: Cascaded semantic → abstractive for maximum savings (~88.1%), requires Ollama * auto: Uses semantic if Ollama is available, otherwise falls back to extractive * Control compression behavior with optional parameters: * target_ratio: Fraction of prose to retain (0.3–0.7) * lossless: Apply only lossless transforms (whitespace normalization, JSON compaction) * query: Prioritize sentences relevant to a given query * Retrieve originals (sarup_retrieve): Recover the exact, byte-for-byte original text using the hash returned by sarup_compress * Track session savings (sarup_stats): View cumulative token savings for the current session, measured via tiktoken cl100k_base * Automate compression: Use an optional PostToolUse hook to automatically compress large Read, Bash, or Grep outputs; Ollama-dependent modes gracefully fall back to extractive if unavailable

Which integrations are available for this server?

Provides optional compression modes (semantic, abstractive, pipeline) using local LLM models hosted by Ollama for enhanced context reduction.

1. Click on "Install Server". 2. Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state. 3. In the chat, type @ followed by the MCP server name and your instructions, e.g., "@sarup Compress this Thai paragraph to half its tokens." That's it! The server will respond to your query, and you can continue using it as needed. Here is a step-by-step guide with screenshots.

sarup

by PHUICMT

Overview Schema Related Servers Score Discussions

Python

Local

สรุป · Sarup

Thai-first context compression for Claude Code. An MCP server that actually shrinks Thai — 50–88% fewer tokens — and caches every original so nothing is ever lost.

License: MIT Python MCP tests

สรุป means "to summarize." Headroom routes Thai through noop (0% savings) because its whitespace tokenizer can't find Thai word boundaries. Sarup uses PyThaiNLP segmentation, so it compresses Thai as well as English — and caches every original so nothing is ever lost.

Related MCP server: cctx

Highlights

🇹🇭 Real Thai compression — PyThaiNLP newmm word segmentation, not whitespace.
♻️ Lossless by guarantee — every compress caches the original; verified: true proves a byte-for-byte round-trip.
🎚️ Five modes — from offline 1 ms TF-IDF to an 88%-savings cascade.
🧠 Optional local LLM — embeddings + rewrite via Ollama, with automatic offline fallback.
📏 Honest metrics — token counts from a real tokenizer (tiktoken), not byte guesses.
🔌 Content-aware — JSON compaction, log dedup, and verbatim code-fence preservation built in.
🛟 Can't break Claude — it's an MCP tool, not an API proxy; if the server is down the tools just go away and Claude keeps working.

Why it's safe — the two-tier guarantee

Tier	What	Guarantee
Compressed view	the shrunk text the model works on	lossy · small · cheap
Retrieval store	the original, keyed by a stable hash	lossless · recoverable

Aggressive lossy compression is safe because the original is always one sarup_retrieve(hash) away. This is how "maximum savings" and "100% accuracy" coexist — they live in different tiers.

How it works

Two entry points feed one engine: a cheap compressed view the model reads, and a lossless retrieval store that can restore the original byte-for-byte.

flowchart TD
    M["🧑 Manual<br/>sarup_compress()"]:::entry --> R
    A["⚙️ Automatic<br/>PostToolUse hook<br/>(Read · Bash · Grep)"]:::entry --> R

    R{"Sarup compress<br/>extractive · semantic · abstractive · pipeline"}:::engine
    R -- "compressed view<br/>50–88% fewer tokens" --> V["📄 Model context"]:::lossy
    R -. "cache original" .-> S[("🗄️ Retrieval store<br/>hash → original")]:::lossless

    V -. "need full detail?" .-> RET["🔑 sarup_retrieve(hash)"]:::lossless
    RET --> S
    S == "byte-for-byte ✓" ==> V

    classDef entry fill:#e0e7ff,stroke:#6366f1,color:#111
    classDef engine fill:#fde68a,stroke:#d97706,color:#111
    classDef lossy fill:#fef3c7,stroke:#f59e0b,color:#111
    classDef lossless fill:#bbf7d0,stroke:#16a34a,color:#111

Manual — the model calls sarup_compress / sarup_retrieve itself.
Automatic — the hook intercepts large tool outputs, caches the original to SARUP_DB_PATH, and substitutes the compressed view + a retrieval hash. Source code is skipped; small outputs pass through untouched.

Tools

Tool	Purpose
`sarup_compress(content, target_ratio?, lossless?, query?, mode?)`	Compress; returns compressed text, hash, token metrics,`verified`, `token_method`.
`sarup_retrieve(hash)`	Recover the original content byte-for-byte.
`sarup_stats()`	Cumulative session savings.

sarup_compress arguments

Arg	Type	Default	Meaning
`content`	string	—	Text to compress (required).
`target_ratio`	number	`0.5`	Fraction of prose to keep (0.1–0.9).
`lossless`	boolean	`false`	Only apply lossless transforms (whitespace / JSON compact).
`query`	string	`""`	Relevance hint — sentences matching it are kept.
`mode`	string	`extractive`	See modes below.

Compression modes

Mode	How	Needs Ollama	Savings¹	Speed¹	Output
`extractive` (default)	TF-IDF scoring + n-gram dedup	no	50.8%	~1 ms	verbatim subset
`semantic`	Embedding centrality + cosine dedup	yes	64.6%	~1–2 s	verbatim subset
`abstractive`	Local-LLM rewrite	yes	~51%	~8–20 s	paraphrased
`pipeline`	Cascade: semantic → abstractive	yes	88.1%	~2 s	paraphrased
`auto`	semantic if Ollama is up, else extractive	optional	64.6%	~90 ms	subset

¹ Measured on a 10-sentence Thai paragraph (522 tokens). Every mode stays 100% recoverable via the store; Ollama modes degrade gracefully to extractive when the backend is down.

Measured results

$ .\.venv\Scripts\python.exe bench\benchmark.py

sample                      before   after   savings   verify
Thai prose                     522     257    50.8%       OK
Thai prose (aggressive)        522     217    58.4%       OK
English prose                  105      54    48.6%       OK
JSON (lossless)                 67      44    34.3%       OK
Logs                           563     300    46.7%       OK
TOTAL                         1779     872    51.0%    ALL OK   → 100% recoverable

Mode comparison (Thai prose, 522 tok):
  extractive 50.8% (1ms) · auto 64.6% (~90ms) · semantic 64.6% (2.1s)
  abstractive 51.1% (8s) · pipeline 88.1% (2.3s)        ← all verified recoverable

Token counts via tiktoken cl100k_base — a real tokenizer, not a byte heuristic.

Example

A real sarup_compress call on a Thai paragraph (mode="auto", Ollama up → semantic):

// → sarup_compress(content="…518-token Thai paragraph…", mode="auto")
{
  "compressed": "จุดเด่นที่สำคัญที่สุดคือมันไม่มีทางทำให้ Claude พัง…",
  "hash": "caa568140bec0ff734937cf5",
  "original_tokens": 518,
  "compressed_tokens": 154,
  "tokens_saved": 364,
  "savings_percent": 70.3,
  "transforms": ["semantic_extractive", "embeddings", "thai"],
  "lossy": true,
  "verified": true,                    // round-trip proven byte-for-byte
  "token_method": "tiktoken:cl100k_base"
}

The model keeps working on the 154-token view; the full 518-token original is one call away:

// → sarup_retrieve(hash="caa568140bec0ff734937cf5")
{ "content": "…the exact original text, restored byte-for-byte…" }

Install

One command (creates the venv, installs everything, registers the MCP server for all projects — idempotent):

.\scripts\setup.ps1 -All      # Windows  (-All also adds the hook, the /sarup-setup skill, pulls Ollama models)
./scripts/setup.sh --all      # Linux / WSL / macOS

Tip: -All/--all installs a global /sarup-setup skill, so on any other machine you can just type /sarup-setup in Claude Code and it walks through the install. (Or run scripts/install-skill.ps1 / install-skill.sh on its own.)

Uninstall just as cleanly (only removes what Sarup added; -Purge/--purge also deletes the venv + cache):

.\scripts\uninstall.ps1       # Windows
./scripts/uninstall.sh        # Linux / WSL / macOS

py -3.11 -m venv .venv
.\.venv\Scripts\python.exe -m pip install -e ".[dev]"

Optional local-LLM modes (semantic / abstractive / pipeline) need Ollama:

ollama pull nomic-embed-text     # embeddings → semantic mode
ollama pull gemma3:12b           # rewrite → abstractive / pipeline (Thai-validated)

Register with Claude Code

One-command setup (recommended). Detects this machine's paths, probes Ollama (picks the best mode + models), and merges into .mcp.json / .claude/settings.json without clobbering anything already there (a .bak is written first):

.\.venv\Scripts\python.exe scripts\install.py --with-hook --pull

No Ollama? It configures offline extractive mode — still fully works.
Ollama up? It auto-selects nomic-embed-text (semantic) + gemma3:12b (rewrite) and sets the hook to auto. --pull fetches any missing models.
Idempotent — safe to re-run; --global writes to ~/.claude instead.

Manual — or add it yourself to your MCP config (e.g. .mcp.json or ~/.claude.json). Replace <SARUP_DIR> with the absolute path where you cloned this repo (the installer above fills these in for you):

{
  "mcpServers": {
    "sarup": {
      "command": "<SARUP_DIR>/.venv/Scripts/python.exe",
      "args": ["-m", "sarup.server"],
      "env": { "SARUP_DB_PATH": "<SARUP_DIR>/.sarup-cache.db" }
    }
  }
}

On Linux/macOS the interpreter is <SARUP_DIR>/.venv/bin/python.

Or run it directly over stdio:

.\.venv\Scripts\python.exe -m sarup.server

Auto-compression hook

Skip manual tool calls entirely: install the PostToolUse hook and large Read/Bash/Grep outputs are compressed before they enter context, with the original cached for retrieval. Source-code reads are skipped for safety. Full setup in hooks/README.md.

Experimental — verify on your build. The hook fires and emits a valid updatedToolOutput, but whether Claude Code applies it is surface-dependent: as of testing, the VS Code extension (2.1.193) does NOT apply it — the model still receives the full output, so the hook is a no-op there. Use the manual sarup_compress tool instead (it works everywhere); the hook may apply on other/CLI builds. Replace <SARUP_DIR> with your clone path, or run install.py --with-hook.

{
  "hooks": {
    "PostToolUse": [
      { "matcher": "Read|Bash|Grep",
        "hooks": [{ "type": "command",
          "command": "<SARUP_DIR>/.venv/Scripts/python.exe <SARUP_DIR>/hooks/sarup_hook.py" }] }
    ]
  },
  "env": { "SARUP_DB_PATH": "<SARUP_DIR>/.sarup-cache.db" }
}

Privacy & data

To guarantee recovery, Sarup caches the original content in the store. Two things to know:

With SARUP_DB_PATH set, originals are written to that SQLite file in plaintext (no encryption). Treat it like a cache of whatever you compressed.
If you compress tool outputs that contain secrets (e.g. a .env dump or credentials in a log), those land in the cache too. The auto-hook skips source-code/config file reads, but Bash output is fair game — review what you point it at.

*.db is git-ignored, so the cache never gets committed. For zero on-disk footprint, leave SARUP_DB_PATH unset (memory-only; the MCP server then loses the cache on restart, and the hook will not substitute — see the hook docs).

Configuration

Var	Default	Meaning
`SARUP_DB_PATH`	(in-memory)	SQLite path for a persistent, cross-process store.Required for hook retrieval.
`OLLAMA_HOST`	`http://localhost:11434`	Ollama endpoint.
`SARUP_ABSTRACTIVE_MODEL`	`gemma3:12b`	Model for abstractive / pipeline rewrite.
`SARUP_EMBED_MODEL`	`nomic-embed-text`	Model for semantic embeddings.
`SARUP_HOOK_MODE`	`auto`	Hook compression mode.
`SARUP_HOOK_MIN_TOKENS`	`400`	Hook only compresses outputs with at least this many tokens (token-based, fair across languages).

Project structure

sarup/
├── src/sarup/
│   ├── server.py       # MCP stdio server — 3 tools
│   ├── compressor.py   # router + modes (extractive/semantic/abstractive/pipeline/auto)
│   ├── thai.py         # PyThaiNLP tokenization, sentence split, TF-IDF
│   ├── semantic.py     # embedding centrality + cosine dedup
│   ├── llm.py          # optional Ollama backend (generate + embed)
│   ├── tokens.py       # real token counting (tiktoken)
│   └── store.py        # CCR store: hash → original (memory + SQLite)
├── hooks/
│   ├── sarup_hook.py   # PostToolUse auto-compression hook
│   └── README.md       # hook install guide
├── bench/benchmark.py  # before/after measurement
├── tests/              # test_thai, test_mcp, test_hook, ...
├── README.md
└── STACK.md            # full stack + techniques

Tech stack & techniques

Python 3.11 · MCP · PyThaiNLP newmm · tiktoken · Ollama (optional) · SQLite · hatchling · pytest.

The technique behind each mode — TF-IDF scoring, embedding centrality, cascade pipeline, content routing, and graceful degradation — is documented in STACK.md.

Testing

.\.venv\Scripts\python.exe -m pytest tests/ -q

The suite covers Thai NLP, the MCP tool contracts, every mode (including Ollama-fallback paths), the roundtrip-verify guarantee, and the auto-compression hook (incl. cross-process retrieval).

Roadmap

Make auto the default mode for sarup_compress (currently extractive).
Optional Typhoon 2.1 abstractive (blocked on an Ollama template fix).
Per-content adaptive target_ratio.
Published PyPI package.

License

MIT

Install Server

license - permissive license

quality

maintenance

How are these scores calculated?

Maintenance

–Maintainers

–Response time

–Release cycle

–Releases (12mo)

Commit activity

Resources

GitHub Repository

Need Help?

Related Servers

Unclaimed servers have limited discoverability.

Looking for Admin?

If you are the server author, to access and configure the admin panel.

Tools

Latest Blog Posts

Your AI Chatbot Just Exposed Your CEO's Salary to an Intern
By Om-Shree-0709 on July 2, 2026.
Agent Identity
MCP Security
OAuth Delegation
Why MCP Servers Need Execution Sandboxing (And Why Your Current Stack Isn't Enough)
By Om-Shree-0709 on June 30, 2026.
Agentic Ai
Prompt Injection
WebAssembly
Lightport: Open-Sourcing Glama's AI Gateway
By punkpeye on April 27, 2026.
OpenAI
open source

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/PHUICMT/sarup'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

สรุป · Sarup

Contents

Highlights

Why it's safe — the two-tier guarantee

How it works

Tools

Compression modes

Measured results

Example

Install

Register with Claude Code

Auto-compression hook

Privacy & data

Configuration

Project structure

Tech stack & techniques

Testing

Roadmap

License

Maintenance

Resources

Looking for Admin?

Tools

Latest Blog Posts

MCP directory API