How do I use TokenSaver MCP?

1. Click on "Install Server". 2. Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state. 3. In the chat, type @ followed by the MCP server name and your instructions, e.g., "@TokenSaver MCP compress my 10-turn conversation history" That's it! The server will respond to your query, and you can continue using it as needed. Here is a step-by-step guide with screenshots.

TokenSaver MCP

by pozii

Overview Schema Related Servers Score Discussions

Python

Local

TokenSaver MCP

Cut your AI API costs by up to 97% — without changing a single prompt.

License Python MCP Tests

An MCP (Model Context Protocol) server that gives AI agents ten tools to measure, compress, cache, and prune token usage — so developers on limited plans can do more with less.

Why TokenSaver?

Every API call sends more tokens than necessary. Conversation history accumulates. Web pages arrive as raw HTML. Tool results get re-fetched on every turn. System prompts bloat over iterations.

TokenSaver intercepts each of these patterns and fixes them at the agent level — no model changes, no prompt engineering, no plan upgrades.

Scenario	Before	After	Saved
10-turn conversation history	40,000 tokens	8,000 tokens	80%
Webpage fetch (raw HTML)	22,000 tokens	1,200 tokens	94%
Bloated system prompt	600 tokens	220 tokens	63%
Repeated tool call (cached)	1,500 tokens	50 tokens	97%

Related MCP server: claw-tsaver

Tools

Tool	What it does
`count_tokens`	Measure token cost before sending — decide whether to compress first
`compress_context`	Shrink long text or conversation history with offline LSA summarization
`cache_store` / `cache_get` / `cache_invalidate`	Persist tool results to disk with TTL — never run the same lookup twice
`extract_webpage`	Fetch a URL and return only the readable content, not raw HTML
`summarize_file`	Get a structural + content summary of any file or directory
`prune_conversation`	Remove filler turns and compress old messages in conversation history
`optimize_prompt`	Shorten verbose system prompts while preserving constraints
`advise_context_window`	Diagnose token bloat and get targeted recommendations

All tools work fully offline — no API key required for core features.

Installation

git clone https://github.com/pozii/tokensaver.git
cd tokensaver
pip install -e .

Python 3.11+ required. On first use, compress_context will auto-download the NLTK punkt_tab tokenizer (~2 MB) if not already present.

How it connects to your AI client

TokenSaver has no URL and runs no background server by default. It uses stdio transport: the AI client reads your config, spawns python -m tokensaver as a child process, and talks to it through stdin/stdout. You never open a port or start anything manually — the client does it for you when it launches.

Your AI client  ──spawn──▶  python -m tokensaver  ──stdio──▶  tools available

The alternative is SSE transport, where you start the server yourself on a local port and the client connects over HTTP. This is useful for multi-agent setups or when multiple clients share the same server instance.

Setup

Claude Desktop

Config file location:

macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
Windows: %APPDATA%\Claude\claude_desktop_config.json

{
  "mcpServers": {
    "tokensaver": {
      "command": "python",
      "args": ["-m", "tokensaver"]
    }
  }
}

Save the file and restart Claude Desktop. The tokensaver tools will appear in the tool list.

Claude Code

claude mcp add tokensaver -- python -m tokensaver

Or add manually to ~/.claude/settings.json:

{
  "mcpServers": {
    "tokensaver": {
      "command": "python",
      "args": ["-m", "tokensaver"]
    }
  }
}

OpenCode

Config file: ~/.config/opencode/config.json

{
  "mcp": {
    "servers": {
      "tokensaver": {
        "type": "local",
        "command": ["python", "-m", "tokensaver"]
      }
    }
  }
}

Any MCP-compatible client (SSE mode)

Start the server once:

python -m tokensaver --transport sse --port 8765

Then point your client at:

http://localhost:8765/sse

Fixing Python path issues

If your system has multiple Python versions and python resolves to the wrong one, use the full path:

# Find the right Python
which python3        # macOS / Linux
where python         # Windows

Then use the full path in your config:

{
  "mcpServers": {
    "tokensaver": {
      "command": "/usr/local/bin/python3",
      "args": ["-m", "tokensaver"]
    }
  }
}

{
  "mcpServers": {
    "tokensaver": {
      "command": "C:\\Python314\\python.exe",
      "args": ["-m", "tokensaver"]
    }
  }
}

Usage

Recommended workflow

Each turn:
  1. count_tokens          → How large is my current context?
  2. advise_context_window → Am I approaching the model's limit?

Before expensive tool calls:
  3. cache_get             → Did I already run this?

When fetching web content:
  4. extract_webpage       → Clean text, not raw HTML

When history grows long:
  5. prune_conversation    → Drop filler turns, compress old ones
  6. compress_context      → Shrink large injected context blocks

When writing system prompts:
  7. optimize_prompt       → Remove redundant phrasing

Tool reference

{
  "content": "Some long text or list of messages...",
  "model": "claude-sonnet-4",
  "include_message_overhead": true
}

Returns token_count, encoding_used, model. Accepts a plain string or an OpenAI-format message list.

{
  "text": "3,000-token context block...",
  "target_tokens": 600,
  "mode": "extractive"
}

extractive (default) uses LSA sentence ranking — free, offline, no API call.
abstractive uses claude-haiku for higher quality — requires ANTHROPIC_API_KEY.

Returns compressed, original_tokens, compressed_tokens, reduction_pct.

# Standard pattern: check before running
key = cache_key("extract_webpage", {"url": "https://example.com"})
hit = cache_get(key=key)

if not hit["hit"]:
    result = extract_webpage(url="https://example.com")
    cache_store(key=key, value=str(result), ttl_seconds=3600)

Cache is stored on disk at ~/.tokensaver/cache/ and survives server restarts.

{
  "url": "https://example.com/article",
  "max_tokens": 2000,
  "include_links": false,
  "include_metadata": true
}

Uses trafilatura with BeautifulSoup as fallback. Returns content, title, token_count, truncated.

{
  "path": "/home/user/myproject",
  "mode": "both",
  "max_tokens": 500,
  "file_extensions": [".py", ".md"],
  "max_depth": 3
}

mode options: "structure" (tree only), "content" (summarized text), "both".

{
  "messages": [...],
  "max_output_tokens": 2000,
  "keep_last_n": 4,
  "prune_strategy": "hybrid"
}

"remove" drops filler turns ("Sure!", "Got it.").
"compress" summarizes older turns in place.
"hybrid" does both — recommended for most cases.

Returns the pruned messages list, original_tokens, pruned_tokens, counts of removed/compressed turns.

{
  "prompt": "Please make sure to always answer questions...",
  "optimization_level": "medium",
  "preserve_constraints": true,
  "output_format": "prose"
}

"light" removes filler phrases. "medium" deduplicates sentences. "aggressive" restructures.
preserve_constraints: true always keeps sentences containing never, must, always, do not.

{
  "model": "gpt-4o",
  "current_tokens": 110000,
  "messages": [...],
  "target_utilization": 0.75
}

Returns status ("ok" / "warning" / "critical"), headroom_tokens, prioritized recommendations, and a per-turn breakdown sorted by token cost.

Supports: GPT-4o, GPT-4o-mini, Claude 3–4 series, Gemini 1.5/2.0/2.5, O1/O3, Llama 3, Mistral.

Optional: LLM-backed summarization

For higher-quality abstractive compression on very large texts (>5,000 tokens):

pip install "tokensaver-mcp[llm]"

Set ANTHROPIC_API_KEY in your environment or a .env file, then use mode: "abstractive" in compress_context.

Running Tests

pip install -e ".[dev]"
python -m pytest tests/ -v

38 tests — all offline, no API key or network required.

Project Structure

src/tokensaver/
  server.py          # FastMCP app, tool registration
  models.py          # Context window table, shared types
  tools/
    counter.py       # count_tokens
    compress.py      # compress_context
    cache.py         # cache_store / cache_get / cache_invalidate
    extractor.py     # extract_webpage
    summarizer.py    # summarize_file
    pruner.py        # prune_conversation
    optimizer.py     # optimize_prompt
    advisor.py       # advise_context_window
  utils/
    token_utils.py   # tiktoken wrapper
    text_utils.py    # sentence splitting, deduplication

Install Server

license - permissive license

quality

maintenance

How are these scores calculated?

Maintenance

–Maintainers

–Response time

–Release cycle

–Releases (12mo)

Commit activity

Resources

GitHub Repository

Need Help?

Related Servers

Unclaimed servers have limited discoverability.

Looking for Admin?

If you are the server author, to access and configure the admin panel.

Tools

View all tools

Related MCP Servers

Manus Credit Optimizer
Developer Tools Autonomous Agents RAG Systems
rafsilva85
A
license
-
quality
C
maintenance
An MCP server that reduces Manus AI credit usage by up to 75% through intelligent prompt compression, smart model routing, and intent classification. It provides tools to analyze and optimize prompts for maximum efficiency without sacrificing quality.
Last updated 2026-03-12
1
MIT
claw-tsaver
AI & Machine Learning Autonomous Agents Developer Tools
Yang1Bai
A
license
A
quality
B
maintenance
An MCP server that helps AI agents reduce token usage by compressing, summarizing, and managing conversation/context data more efficiently.
Last updated 2026-05-20
1
1
MIT
toon-parse-mcp
Developer Tools File Systems
ankitpal181
A
license
-
quality
D
maintenance
An MCP server that helps AI agents reduce token usage by converting data to TOON format and stripping comments and unnecessary whitespace from code files.
Last updated 2026-01-17
MIT
token-optimization-mcp
AI & Machine Learning RAG Systems
DCx7C5
F
license
A
quality
D
maintenance
A fully offline MCP server for token estimation, prompt compression, model routing, and semantic caching to optimize LLM usage costs and efficiency.
Last updated 2026-04-10
9

View all related MCP servers

Related MCP Connectors

TokenOracle
Hosted MCP server for LLM cost estimation, model comparison, and budget-aware routing.
Motecloud Memory
Cloud-hosted MCP server for durable AI memory
Danushkumar-V-mcp-discord
An MCP server that integrates with Discord to provide AI-powered features.

View all MCP Connectors

Latest Blog Posts

Who's Calling? MCP Hosts Are an Identity Blind Spot (And the Spec Knows It)
By Om-Shree-0709 on July 25, 2026.
mcp
Agent Identity
OAuth 2.1
Your AI Chatbot Just Exposed Your CEO's Salary to an Intern
By Om-Shree-0709 on July 2, 2026.
Agent Identity
MCP Security
OAuth Delegation
Why MCP Servers Need Execution Sandboxing (And Why Your Current Stack Isn't Enough)
By Om-Shree-0709 on June 30, 2026.
Agentic Ai
Prompt Injection
WebAssembly

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/pozii/tokensaver'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

TokenSaver MCP

Why TokenSaver?

Tools

Installation

How it connects to your AI client

Setup

Claude Desktop

Claude Code

OpenCode

Any MCP-compatible client (SSE mode)

Fixing Python path issues

Usage

Recommended workflow

Tool reference

Optional: LLM-backed summarization

Running Tests

Project Structure

Maintenance

Resources

Looking for Admin?

Tools

Related MCP Servers

Manus Credit Optimizer

claw-tsaver

toon-parse-mcp

token-optimization-mcp

Related MCP Connectors

Latest Blog Posts

MCP directory API