ollama-handoff
Enables AI agents to offload routine tasks to local Ollama models, reducing cloud costs and frontier model context usage.
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@ollama-handoffsummarize the errors in build.log"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
ollama-handoff
An MCP server that offloads cheap work from your cloud LLM agent to a local Ollama model.
Your frontier model (Claude, GPT, etc.) is brilliant and metered. A lot of the work it gets handed â summarizing a log, drafting a commit message, pulling every URL out of a file, a quick first-pass code review â doesn't need frontier reasoning at all. ollama-handoff exposes your local Ollama instance as a handful of purpose-built MCP tools, so your agent can route that work to a model on your own GPU â at zero cloud cost â and spend its (paid) reasoning budget on the things that actually need it.
This isn't a generic "wrap the Ollama API" server. Each tool ships with a baked-in system prompt and a description written for the calling agent, so the agent knows when to hand off and gets a tuned result back without re-stating instructions every call.
Why you'd want this
ðļ Spend less. Routine offloads run locally and bill nothing.
⥠Keep the big model focused. Summaries, extractions, and drafts don't eat its context or your budget.
ð§ Tuned, not raw.
summarize_local,code_review_local,draft_commit_message_local, andextract_localcome with reviewer/summarizer/extractor system prompts already dialed in.ð Drop-in. One MCP registration; works with Claude Code, Claude Desktop, Cursor, and any MCP client.
ðŠķ Tiny & auditable. Two dependencies (
mcp,httpx), fully typed, unit-tested, no telemetry.
Related MCP server: Ollama MCP Server
Requirements
Ollama running locally (
ollama serve) with at least one model pulled, e.g.ollama pull qwen2.5-coder:14b.Python 3.11+ (or just
uvx, which manages it for you).
Install
The fastest path is uv â no manual venv needed. Run straight from the repo:
uvx --from git+https://github.com/Michael-WhiteCapData/ollama-handoff ollama-handoffðĶ A PyPI release is on the way; once published,
uvx ollama-handoffandpip install ollama-handoffwill work directly.
Claude Code
claude mcp add ollama-handoff -- uvx --from git+https://github.com/Michael-WhiteCapData/ollama-handoff ollama-handoffClaude Desktop / Cursor (mcp config block)
{
"mcpServers": {
"ollama-handoff": {
"command": "uvx",
"args": [
"--from",
"git+https://github.com/Michael-WhiteCapData/ollama-handoff",
"ollama-handoff"
],
"env": {
"OLLAMA_DEFAULT_MODEL": "qwen2.5-coder:14b"
}
}
}
}Tools
Tool | What it does | When the agent should reach for it |
| One-shot prompt to the local model | Any handoff that doesn't need frontier reasoning |
| Multi-turn local chat | Handoffs needing more than one turn of context |
| Structured summary (headline + bullets) | Long files, logs, transcripts, docs |
| Quick first-pass review of a diff/code | Cheap pre-filter before a deep review |
| Conventional commit message from a diff | Routine commits |
| Pull structured items from unstructured text | URLs, function names, error codes, TODOs |
| List locally available Ollama models | Discovery / choosing a model |
| Report the effective configuration | Debugging setup |
Configuration
All configuration is via environment variables set in your MCP registration:
Variable | Default | Description |
|
| Base URL of the Ollama server |
|
| Default model for handoffs |
|
| Context window in tokens |
|
| How long to keep the model resident in VRAM |
|
| Per-request timeout, seconds |
Example
Once registered, you don't call the tools yourself â your agent does. A typical exchange:
You: Summarize the errors in
build.logand draft a commit for the staged fix.Agent: (calls
summarize_local(build.log, focus="errors and stack traces")anddraft_commit_message_local(git diff --staged)â both run on your GPU, nothing billed) â returns the summary + commit message.
Development
git clone https://github.com/Michael-WhiteCapData/ollama-handoff
cd ollama-handoff
uv pip install -e ".[dev]"
ruff check .
pytest # tests use httpx.MockTransport â no running Ollama requiredSee CONTRIBUTING.md. Contributions welcome â especially new specialized handoff tools.
License
MIT ÂĐ Michael Tierney
Maintenance
Resources
Unclaimed servers have limited discoverability.
Looking for Admin?
If you are the server author, to access and configure the admin panel.
Latest Blog Posts
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/Michael-WhiteCapData/ollama-handoff'
If you have feedback or need assistance with the MCP directory API, please join our Discord server