mcp-llm-offload
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@mcp-llm-offloadask: what is the capital of France?"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
mcp-llm-offload
An MCP server that offloads light LLM work from Claude (or any MCP client) to a model you control β a local LLM (LM Studio, Ollama, llama.cpp) or any OpenAI-compatible provider (OpenRouter, xAI Grok, OpenAI, Groq, Togetherβ¦). Save frontier-model quota on the cheap, non-critical stuff.
Why
Frontier models are great, but a lot of day-to-day agent work is light: summarize this log, classify this ticket, pull fields out of this blob, rephrase this sentence. Paying frontier-model rates (and quota) for that is wasteful.
mcp-llm-offload exposes a handful of MCP tools that forward those tasks to a backend of your choosing. Because LM Studio, Ollama, llama.cpp, OpenRouter, Grok, OpenAI, Groq and Together all speak the same /v1/chat/completions API, one tiny server talks to all of them β and you can switch backends with an env var or override per call.
Related MCP server: Just Prompt
Features
π Provider-agnostic β one server, any OpenAI-compatible endpoint. Presets for the common ones; bring-your-own for the rest.
π Local-first β defaults to a local LM Studio; no API key required for local backends.
π― Purpose-built tools β
ask,summarize,classify,extract,healthβ each shaped for a light task, not just a raw chat passthrough.π§ Per-call routing β every tool takes optional
providerandmodelargs, so the cheap stuff goes local and the slightly harder stuff can go to Grok/OpenRouter without reconfiguring.π©Ί Actionable errors β connection, timeout, auth, 404-model, and rate-limit failures come back as plain, fix-this-next strings instead of stack traces.
π¦ Single file, zero install β PEP 723 inline deps mean
uv run llm_offload_mcp.pyjust works.π€ Claude Code subagent included β an optional
llm-offloaderagent that auto-routes light work for you.
Supported providers
Provider | Default endpoint | API key env | Example model |
|
| β (none) | your loaded model |
|
| β (none) |
|
|
| β (none) | loaded model |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| see DeepInfra |
|
|
|
|
anything else | set |
| β any OpenAI-compatible service |
Use any name you like for a custom provider: set
FOO_BASE_URL(andFOO_API_KEYif needed), then call a tool withprovider="foo".
How it works
Claude Code ββstdioβββΆ mcp-llm-offload ββHTTP /v1/chat/completionsβββΆ your backend
(frontier) (this server) (local / Grok / OpenRouter β¦)The server is a thin, well-behaved MCP front-end. It resolves which backend and model to use (per call β env β preset), folds any system instruction into the user turn for maximum template compatibility, calls the endpoint, and returns clean text (or an Error: β¦ string).
Quick start
1. Prerequisites
uv(recommended) β or Python 3.10+ withpip.A backend: a running local server (e.g. LM Studio β Developer βΈ Start Server) or an API key for a hosted provider.
2. Get it
git clone https://github.com/jonpol01/mcp-llm-offload.git
cd mcp-llm-offloadRun it standalone to confirm it starts (it serves MCP over stdio, so it will wait for a client β Ctrl-C to exit):
uv run llm_offload_mcp.pyNo
uv?pip install mcp httpxthenpython llm_offload_mcp.py.
3. Register with Claude Code
The MCP server name you choose here becomes the tool prefix (mcp__<name>__ask, β¦). The bundled subagent expects the name offload, so use that unless you also edit the agent.
Local LM Studio (point it at a LAN host if LM Studio runs on another machine):
claude mcp add offload \
-e LLM_PROVIDER=lmstudio \
-e LMSTUDIO_BASE_URL=http://localhost:1234/v1 \
-e LLM_MODEL=your-local-model-id \
-- uv run /absolute/path/to/llm_offload_mcp.pyOpenRouter:
claude mcp add offload \
-e LLM_PROVIDER=openrouter \
-e OPENROUTER_API_KEY=sk-or-... \
-e LLM_MODEL=meta-llama/llama-3.3-70b-instruct \
-- uv run /absolute/path/to/llm_offload_mcp.pyxAI Grok:
claude mcp add offload \
-e LLM_PROVIDER=grok \
-e XAI_API_KEY=xai-... \
-e LLM_MODEL=grok-2-latest \
-- uv run /absolute/path/to/llm_offload_mcp.pyOr, equivalently, in a JSON MCP config (.mcp.json, Claude Desktop, etc.):
{
"mcpServers": {
"offload": {
"command": "uv",
"args": ["run", "/absolute/path/to/llm_offload_mcp.py"],
"env": {
"LLM_PROVIDER": "lmstudio",
"LMSTUDIO_BASE_URL": "http://localhost:1234/v1",
"LLM_MODEL": "your-local-model-id"
}
}
}
}4. Verify
In Claude Code, run the health tool (or ask Claude to). You should see the resolved provider, base URL, and the list of models the backend reports.
Tools
Tool | Signature | Purpose |
|
| Free-form light generation (Q&A, rephrase, draft). |
|
| Faithful summary, length- and style-bounded. |
|
| Single-label classification; returns one of |
|
| Structured extraction β clean JSON string. |
|
| Reachability check + lists the backend's models. |
Every generation tool accepts provider and model to override the configured default for that single call.
Configuration
All configuration is via environment variables β none are required if the defaults (a local LM Studio) suit you and you pass model per call.
Variable | Description | Default |
| Default provider name (see table). |
|
| Default model id (as the provider names it). | (unset) |
| Request timeout, seconds. |
|
| Override a provider's endpoint, e.g. | preset |
| A provider's API key, e.g. | conventional env / |
| Default model for a specific provider. |
|
| Generic fallbacks for the default provider. | β |
| Optional OpenRouter ranking headers. | β |
See .env.example for a copy-paste starting point.
The Claude Code subagent (optional)
agents/llm-offloader.md is a ready-made subagent that proactively routes light work to this server and hands anything heavy or correctness-critical back to the main agent. It runs on a cheap dispatch model (haiku) so the routing costs almost nothing and the work lands on your backend.
Install it by copying into your agents directory:
# user-wide
cp agents/llm-offloader.md ~/.claude/agents/
# or per-project
mkdir -p .claude/agents && cp agents/llm-offloader.md .claude/agents/Its
tools:list referencesmcp__offload__*, so it requires the server to be registered under the nameoffload.
Examples
Ask Claude things like:
"Summarize this 4k-word changelog into 5 bullets β offload it."
"Classify each of these 30 support messages as bug / feature / question using the local model."
"Extract name, date, and total from this receipt text as JSON via the offloader."
"Use the offloader on OpenRouter to rephrase this paragraph." (per-call
provider="openrouter")
Troubleshooting
Symptom | Likely fix |
| Backend isn't running / wrong URL. For LM Studio, Start Server and bind to |
| Missing/invalid API key β set the provider's |
| Model id is wrong or not loaded. Run |
| Back off, or pass |
| Large input or a slow/loading model β raise |
Subagent has no tools | Server isn't registered under the name |
Development
uvx ruff check . # lint
uv run --with mcp --with httpx python -c \
"import importlib.util as u; s=u.spec_from_file_location('m','llm_offload_mcp.py'); m=u.module_from_spec(s); s.loader.exec_module(m); print('ok', m.mcp.name)"CI (GitHub Actions) runs the same lint + import smoke test on every push and PR.
Contributing
Issues and PRs welcome. Keep the server single-file and provider-neutral; new providers are usually just one row in the PROVIDERS registry.
License
MIT Β© John Paul Soliva
This server cannot be installed
Maintenance
Resources
Unclaimed servers have limited discoverability.
Looking for Admin?
If you are the server author, to access and configure the admin panel.
Latest Blog Posts
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/jonpol01/mcp-llm-offload'
If you have feedback or need assistance with the MCP directory API, please join our Discord server