mcp-ollama
Wraps local Ollama models to offload text generation, summarization, code tasks, mechanical transforms, and drafting work from API-priced orchestrators, exposing 9 tools that pass work to a local model.
mcp-ollama
MCP server wrapping local Ollama models for offload from API-priced orchestrators.
Exposes nine tools that pass work to a local model (text generation, summarisation, code tasks, mechanical transforms, commit/PR/changelog drafting). The orchestrator decides what to route locally; this server does the routing.
Transport: stdio
Runtime: Node 18+
Default model:
hermes3:8b(override viaOLLAMA_MODEL)Ollama host:
http://localhost:11434(override viaOLLAMA_HOST)Ships no model weights, no cloud call-outs, no telemetry. Every request stays on the host where Ollama is running.
License: Apache-2.0
Why
Orchestrators priced by the token (Claude Code, Cursor, the Anthropic API, Cline, Aider) pay for every classification, every docstring, every commit message. Most of that work doesn't need a frontier model. Routed to Ollama on the same machine, the same work is free and faster. mcp-ollama is the routing surface.
The orchestrating model decides what to route where. This server is plumbing — it does not try to be clever about task classification. Pick the right tool, pass the text, get a result back.
Install
From source
git clone https://github.com/true-alter/mcp-ollama.git
cd mcp-ollama
npm install
npm run buildYou also need a running Ollama instance with at least one model pulled:
# Default — 8B, fast, good for classifications and short generations
ollama pull hermes3:8b
# Optional — code-specialised, heavier, better for local_code tasks
ollama pull qwen2.5-coder:32bDocker
docker build -t mcp-ollama .
docker run -i --rm \
-e OLLAMA_HOST=http://host.docker.internal:11434 \
-e OLLAMA_MODEL=hermes3:8b \
mcp-ollamaThe supplied Dockerfile points at host.docker.internal:11434 so the container reaches Ollama on the host.
Run (stdio)
node dist/index.jsStdio servers are launched by the MCP client (Claude Code, Cursor, etc.) — running it directly is only useful for debugging.
Configure Claude Code
claude mcp add --transport stdio ollama -- node /absolute/path/to/mcp-ollama/dist/index.jsOr in ~/.claude/settings.json:
{
"mcpServers": {
"ollama": {
"transport": "stdio",
"command": "node",
"args": ["/absolute/path/to/mcp-ollama/dist/index.js"],
"env": {
"OLLAMA_HOST": "http://localhost:11434",
"OLLAMA_MODEL": "hermes3:8b"
}
}
}
}Tools
Tool | Purpose |
| General-purpose generation with system + user prompt |
| Summarise a blob of text |
| Analyse text against a specific question |
| Draft content in a given style |
| Code tasks: docstring / test / explain / review / types / refactor-suggest |
| Diff-driven tasks: commit-message / pr-description / changelog / summary / impact |
| Mechanical code transformations |
| List models available on the local Ollama host |
| Pull a model onto the local Ollama host |
Full tool schemas are exposed over MCP introspection — any MCP-aware client will enumerate them automatically.
Environment variables
Variable | Default | Purpose |
|
| Ollama HTTP endpoint |
|
| Default model when a tool call omits |
Any tool call may override model explicitly — the env default only applies when unset. local_code tends to work better with a code-specialised model passed per-call, while local_summarize and local_draft are fine on the default.
Model selection guidance
Workload | Recommended model | Rationale |
Classification, one-liners, tags |
| Fastest round-trip, cheap to run |
Commit messages, changelogs, summaries |
| Higher quality, still comfortable on 16GB GPU |
Code review, docstrings, tests |
| Code-specialised |
Fallback / unknown model | whatever | Inspect first, then route |
Use local_models at session start if you're unsure what's available on a host.
Troubleshooting
Ollama error 404 when calling a tool. The model isn't pulled. Run ollama pull <name> or call local_pull from the client.
fetch failed / connection refused. Ollama isn't running, or OLLAMA_HOST points somewhere wrong. Verify with curl $OLLAMA_HOST/api/tags. Inside a container, localhost is the container itself — use host.docker.internal on macOS/Windows or a bridge IP on Linux.
Tool calls feel slow. First call to a cold model incurs a load. Subsequent calls within the same Ollama process are much faster. If the model is larger than available VRAM, Ollama falls back to CPU — watch ollama ps to confirm.
Empty or truncated output. max_tokens defaults to 2048 per tool. For long generations, pass max_tokens explicitly in the tool call.
Security posture
mcp-ollama makes no network call of its own beyond the configured OLLAMA_HOST. It ships no telemetry, no analytics, no auto-update pinger. Tool inputs are forwarded to Ollama's HTTP API verbatim and the response is relayed back; the server itself is stateless between calls.
If you run Ollama on localhost (the default) the entire loop stays on the host. If you point OLLAMA_HOST at a remote endpoint, treat that endpoint's security posture as authoritative — a typo sending prompts to a third-party host is trivially possible.
To report a security issue, see SECURITY.md.
Contributing
Bug reports and small patches welcome — see CONTRIBUTING.md. Larger design changes: please open an issue first so we can talk about scope before you invest time.
Part of ALTER
mcp-ollama is maintained by ALTER as part of the identity infrastructure for the AI economy. The ALTER identity MCP server is hosted at mcp.truealter.com — see @truealter/sdk for the TypeScript client.
License
Apache License 2.0. See LICENSE for the full text. Copyright 2026 Alter Meridian Pty Ltd (ABN 54 696 662 049).
This server cannot be installed
Resources
Unclaimed servers have limited discoverability.
Looking for Admin?
If you are the server author, to access and configure the admin panel.
Latest Blog Posts
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/true-alter/mcp-ollama'
If you have feedback or need assistance with the MCP directory API, please join our Discord server