TokenSaver MCP
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@TokenSaver MCPcompress my 10-turn conversation history"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
TokenSaver MCP
Cut your AI API costs by up to 97% — without changing a single prompt.
An MCP (Model Context Protocol) server that gives AI agents ten tools to measure, compress, cache, and prune token usage — so developers on limited plans can do more with less.
Why TokenSaver?
Every API call sends more tokens than necessary. Conversation history accumulates. Web pages arrive as raw HTML. Tool results get re-fetched on every turn. System prompts bloat over iterations.
TokenSaver intercepts each of these patterns and fixes them at the agent level — no model changes, no prompt engineering, no plan upgrades.
Scenario | Before | After | Saved |
10-turn conversation history | 40,000 tokens | 8,000 tokens | 80% |
Webpage fetch (raw HTML) | 22,000 tokens | 1,200 tokens | 94% |
Bloated system prompt | 600 tokens | 220 tokens | 63% |
Repeated tool call (cached) | 1,500 tokens | 50 tokens | 97% |
Related MCP server: Manus Credit Optimizer
Tools
Tool | What it does |
| Measure token cost before sending — decide whether to compress first |
| Shrink long text or conversation history with offline LSA summarization |
| Persist tool results to disk with TTL — never run the same lookup twice |
| Fetch a URL and return only the readable content, not raw HTML |
| Get a structural + content summary of any file or directory |
| Remove filler turns and compress old messages in conversation history |
| Shorten verbose system prompts while preserving constraints |
| Diagnose token bloat and get targeted recommendations |
All tools work fully offline — no API key required for core features.
Installation
git clone https://github.com/pozii/tokensaver.git
cd tokensaver
pip install -e .Python 3.11+ required. On first use,
compress_contextwill auto-download the NLTKpunkt_tabtokenizer (~2 MB) if not already present.
How it connects to your AI client
TokenSaver has no URL and runs no background server by default. It uses stdio transport: the AI client reads your config, spawns python -m tokensaver as a child process, and talks to it through stdin/stdout. You never open a port or start anything manually — the client does it for you when it launches.
Your AI client ──spawn──▶ python -m tokensaver ──stdio──▶ tools availableThe alternative is SSE transport, where you start the server yourself on a local port and the client connects over HTTP. This is useful for multi-agent setups or when multiple clients share the same server instance.
Setup
Claude Desktop
Config file location:
macOS:
~/Library/Application Support/Claude/claude_desktop_config.jsonWindows:
%APPDATA%\Claude\claude_desktop_config.json
{
"mcpServers": {
"tokensaver": {
"command": "python",
"args": ["-m", "tokensaver"]
}
}
}Save the file and restart Claude Desktop. The tokensaver tools will appear in the tool list.
Claude Code
claude mcp add tokensaver -- python -m tokensaverOr add manually to ~/.claude/settings.json:
{
"mcpServers": {
"tokensaver": {
"command": "python",
"args": ["-m", "tokensaver"]
}
}
}OpenCode
Config file: ~/.config/opencode/config.json
{
"mcp": {
"servers": {
"tokensaver": {
"type": "local",
"command": ["python", "-m", "tokensaver"]
}
}
}
}Any MCP-compatible client (SSE mode)
Start the server once:
python -m tokensaver --transport sse --port 8765Then point your client at:
http://localhost:8765/sseFixing Python path issues
If your system has multiple Python versions and python resolves to the wrong one, use the full path:
# Find the right Python
which python3 # macOS / Linux
where python # WindowsThen use the full path in your config:
{
"mcpServers": {
"tokensaver": {
"command": "/usr/local/bin/python3",
"args": ["-m", "tokensaver"]
}
}
}{
"mcpServers": {
"tokensaver": {
"command": "C:\\Python314\\python.exe",
"args": ["-m", "tokensaver"]
}
}
}Usage
Recommended workflow
Each turn:
1. count_tokens → How large is my current context?
2. advise_context_window → Am I approaching the model's limit?
Before expensive tool calls:
3. cache_get → Did I already run this?
When fetching web content:
4. extract_webpage → Clean text, not raw HTML
When history grows long:
5. prune_conversation → Drop filler turns, compress old ones
6. compress_context → Shrink large injected context blocks
When writing system prompts:
7. optimize_prompt → Remove redundant phrasingTool reference
{
"content": "Some long text or list of messages...",
"model": "claude-sonnet-4",
"include_message_overhead": true
}Returns token_count, encoding_used, model. Accepts a plain string or an OpenAI-format message list.
{
"text": "3,000-token context block...",
"target_tokens": 600,
"mode": "extractive"
}extractive (default) uses LSA sentence ranking — free, offline, no API call.abstractive uses claude-haiku for higher quality — requires ANTHROPIC_API_KEY.
Returns compressed, original_tokens, compressed_tokens, reduction_pct.
# Standard pattern: check before running
key = cache_key("extract_webpage", {"url": "https://example.com"})
hit = cache_get(key=key)
if not hit["hit"]:
result = extract_webpage(url="https://example.com")
cache_store(key=key, value=str(result), ttl_seconds=3600)Cache is stored on disk at ~/.tokensaver/cache/ and survives server restarts.
{
"url": "https://example.com/article",
"max_tokens": 2000,
"include_links": false,
"include_metadata": true
}Uses trafilatura with BeautifulSoup as fallback. Returns content, title, token_count, truncated.
{
"path": "/home/user/myproject",
"mode": "both",
"max_tokens": 500,
"file_extensions": [".py", ".md"],
"max_depth": 3
}mode options: "structure" (tree only), "content" (summarized text), "both".
{
"messages": [...],
"max_output_tokens": 2000,
"keep_last_n": 4,
"prune_strategy": "hybrid"
}"remove" drops filler turns ("Sure!", "Got it.")."compress" summarizes older turns in place."hybrid" does both — recommended for most cases.
Returns the pruned messages list, original_tokens, pruned_tokens, counts of removed/compressed turns.
{
"prompt": "Please make sure to always answer questions...",
"optimization_level": "medium",
"preserve_constraints": true,
"output_format": "prose"
}"light" removes filler phrases. "medium" deduplicates sentences. "aggressive" restructures.preserve_constraints: true always keeps sentences containing never, must, always, do not.
{
"model": "gpt-4o",
"current_tokens": 110000,
"messages": [...],
"target_utilization": 0.75
}Returns status ("ok" / "warning" / "critical"), headroom_tokens, prioritized recommendations, and a per-turn breakdown sorted by token cost.
Supports: GPT-4o, GPT-4o-mini, Claude 3–4 series, Gemini 1.5/2.0/2.5, O1/O3, Llama 3, Mistral.
Optional: LLM-backed summarization
For higher-quality abstractive compression on very large texts (>5,000 tokens):
pip install "tokensaver-mcp[llm]"Set ANTHROPIC_API_KEY in your environment or a .env file, then use mode: "abstractive" in compress_context.
Running Tests
pip install -e ".[dev]"
python -m pytest tests/ -v38 tests — all offline, no API key or network required.
Project Structure
src/tokensaver/
server.py # FastMCP app, tool registration
models.py # Context window table, shared types
tools/
counter.py # count_tokens
compress.py # compress_context
cache.py # cache_store / cache_get / cache_invalidate
extractor.py # extract_webpage
summarizer.py # summarize_file
pruner.py # prune_conversation
optimizer.py # optimize_prompt
advisor.py # advise_context_window
utils/
token_utils.py # tiktoken wrapper
text_utils.py # sentence splitting, deduplicationMaintenance
Resources
Unclaimed servers have limited discoverability.
Looking for Admin?
If you are the server author, to access and configure the admin panel.
Latest Blog Posts
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/pozii/tokensaver'
If you have feedback or need assistance with the MCP directory API, please join our Discord server