claude-ollama-mcp
Provides tools to query and manage a local Ollama server, including listing installed models, running text completions (generate/chat), pulling models from the registry, deleting models, and checking server status.
Claude Ollama
Lets Claude Desktop query and manage a local Ollama server. List installed models, inspect them, run one-shot generate/chat completions against any local model, or pull/delete models from the registry — all without opening a terminal.
Typical use: comparing Claude's answer to a local model on the same prompt, running cheap bulk completions against a quantized model, or checking custom training-checkpoint models you've imported into Ollama.
Requirements
A running Ollama server (
ollama serveor the Ollama app).Default endpoint is
http://localhost:11434. Override via theollama_urluser config in Claude Desktop's extension settings if you run Ollama on a different host or port.No npm dependencies — pure Node over the HTTP API.
Install (Claude Desktop)
Download the latest
Ollama.mcpbfrom the Releases page.In Claude Desktop: Settings → Extensions → Extension Developer → Install Extension → pick the
.mcpb.(Optional) In the extension's settings, set
Ollama server URLif you run Ollama on a non-default host/port. Leave blank forhttp://localhost:11434.
Tools
Tool | Annotation | Purpose |
| read-only | Health check + server version |
| read-only | Local models with size, digest, family, parameter size, quantization |
| read-only | Models currently loaded in VRAM |
| read-only | Model details: modelfile, parameters, template, capabilities |
| open-world | One-shot text completion (non-streaming) |
| open-world | Chat completion with message history (non-streaming) |
| open-world | Download a model from the registry |
| destructive | Remove a locally-installed model |
Example prompts
"Which local models do I have installed, and which one is currently loaded in VRAM?"
"Run
forge:b6c1on this prompt: ''. Compare that output to your own answer.""Show me the modelfile for
forge:b7c1— I want to check the temperature setting.""Pull
llama3.1:70b." (expect a long wait for large models)"Delete the
forge:b5c3model — I don't need that checkpoint anymore."
Privacy policy
This extension runs entirely on your local machine and sends HTTP requests only to your Ollama server (default http://localhost:11434). No data leaves your machine unless you explicitly configure ollama_url to point at a remote Ollama instance, in which case the prompts and responses travel to that server.
The information visible to Claude includes:
All prompts and chat messages you pass to
generateandchat(these go to the Ollama server, which may log them depending on its configuration).Full text of completions returned by Ollama.
Metadata for every installed model (names, digests, sizes, quantization, modelfile contents).
Which models are currently loaded in VRAM and their size footprint.
If you have installed models containing proprietary fine-tunes or modelfiles with sensitive metadata, note that Claude will see that information when you call show_model or list_models.
delete_model is destructive and cannot be undone from this extension — the model must be re-pulled from the registry (or re-imported from source blobs) if deleted by mistake.
Troubleshooting
"cannot reach Ollama at http://localhost:11434 — is the server running?" — Start Ollama with ollama serve or launch the Ollama app. Verify with curl http://localhost:11434/ (should return "Ollama is running").
pull_model hangs for a long time — Ollama's pull API with stream: false blocks until the full download completes, which for multi-GB models can take many minutes. If you're pulling a huge model, run ollama pull <name> in a terminal instead — you'll see streaming progress there, and subsequent MCP calls will find the model already installed.
Custom/remote Ollama endpoint — Set ollama_url in the extension's settings (e.g. http://192.168.1.42:11434). Requires restart of the extension.
list_running shows a model after you stopped using it — Ollama keeps models hot in VRAM for a configurable TTL (default 5 minutes). The expires_at timestamp tells you when it'll unload. This is Ollama's behavior, not the extension's.
Development
Single ~400-line Node.js script, zero npm dependencies. Rebuild the .mcpb:
cd bundle-source
zip -j ../Ollama.mcpb manifest.json package.json server.js README.md LICENSE icon.png glama.jsonLicense
MIT. See LICENSE.
Related
claude-terminal-mcp — shell, filesystem, and background jobs.
claude-rocm-mcp — AMD GPU monitoring; pairs well for checking whether Ollama's loaded model is saturating VRAM.
claude-sessions-mcp — tmux session management for long-running jobs.
claude-linux-mcp — X11 desktop control.
Maintenance
Latest Blog Posts
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/LukeLamb/claude-ollama-mcp'
If you have feedback or need assistance with the MCP directory API, please join our Discord server