Ollama MCP Domain Expert Delegation
Provides tools for delegating domain-specific tasks to local Ollama models, enabling AI agents to generate configurations, validate designs, query domain experts, build GraphQL queries, compress context, and more using locally-running LLMs.
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@Ollama MCP Domain Expert Delegationgenerate OSPF config for new segment 10.0.1.0/24"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
Ollama MCP — Local LLM Domain Expert Delegation
An MCP server that lets your orchestrating AI model (Claude, GPT, Qwen, DeepSeek, etc.) delegate domain-specific tasks to local Ollama models running on your own GPU. Instead of one model doing everything, purpose-built specialists handle structured tasks while the orchestrator focuses on planning and user interaction.
Why
Running AI agents with dozens of tools and complex multi-step workflows burns through cloud LLM tokens fast. Many of those tokens go to structured tasks that don't need frontier-level reasoning:
Generating config from structured data (template-filling with rules)
Parsing show command output (pattern matching)
Building API queries (schema mapping)
Validating configs against a source of truth (checklist evaluation)
These tasks are ideal for small local models (7B) with baked-in system prompts. The expertise lives in the prompt, not the model weights.
Related MCP server: mcp-ollama-python
Architecture
┌─────────────────────────────────────────┐
│ Orchestrating Model (Claude, etc.) │
│ Plans, decides, interacts with user │
└──────────┬──────────────────────────────┘
│ MCP tool calls
▼
┌─────────────────────────────────────────┐
│ ollama-mcp (this server) │
│ Routes by domain → local expert model │
└──────────┬──────────────────────────────┘
│ HTTP API
▼
┌─────────────────────────────────────────┐
│ Ollama Instance (local GPU) │
│ ┌─────────┐ ┌─────────┐ ┌─────────┐ │
│ │ ospf:7b │ │ bgp:7b │ │ frr:7b │ │
│ └─────────┘ └─────────┘ └─────────┘ │
│ Same base weights, different prompts │
└─────────────────────────────────────────┘All expert "models" share the same base weights (e.g., qwen2.5-coder:7b). Only their system prompts differ. Ollama deduplicates shared layers on disk.
Tools Provided
Tool | Purpose |
| Delegate config generation to a domain expert |
| Validate a network design against RFCs |
| Ask a domain expert a technical question |
| Validate config matches source-of-truth intent |
| Build GraphQL queries from natural language |
| Compress show command output to JSON digest |
| Reduce large API responses to task-relevant JSON |
| List configured experts and availability |
| Check Ollama connectivity |
| Show token savings metrics |
Quick Start
1. Install Ollama and pull a base model
# On your GPU machine
ollama pull qwen2.5-coder:7b2. Create domain expert models
cd mcp-servers/ollama-mcp/modelfiles/
# Use the examples as starting points
cp Modelfile.example-ospf Modelfile.my-ospf
# Edit the system prompt for your topology/rules
ollama create my-ospf-expert:7b -f Modelfile.my-ospf3. Configure environment
export OLLAMA_BASE_URL=http://localhost:11434
export OLLAMA_TIMEOUT=60
export OLLAMA_MODEL_OSPF=my-ospf-expert:7b
export OLLAMA_MODEL_BGP=my-bgp-expert:7b
export OLLAMA_MODEL_GENERAL=qwen2.5-coder:7b
export OLLAMA_MODEL_FALLBACK=qwen2.5-coder:7b4. Run the MCP server
cd mcp-servers/ollama-mcp/
pip install -r requirements.txt
python server.py5. Add to your OpenClaw/agent config
{
"ollama-mcp": {
"command": "python3",
"args": ["-u", "mcp-servers/ollama-mcp/server.py"],
"env": {
"OLLAMA_BASE_URL": "http://localhost:11434",
"OLLAMA_TIMEOUT": "60",
"OLLAMA_MODEL_OSPF": "my-ospf-expert:7b",
"OLLAMA_MODEL_GENERAL": "qwen2.5-coder:7b",
"OLLAMA_MODEL_FALLBACK": "qwen2.5-coder:7b"
}
}
}Creating Custom Domain Experts
The "expert" is just a base model + system prompt. No training required.
Modelfile Structure
FROM qwen2.5-coder:7b ← base model (any Ollama model)
PARAMETER temperature 0.1 ← low = deterministic output
PARAMETER num_predict 4096 ← max output tokens
SYSTEM """
Your domain-specific rules, examples, and output format here.
"""What Makes a Good Expert
Narrow scope — handle one specific task type well
Explicit rules — "NEVER do X", "ALWAYS do Y" with ❌ markers
Worked examples — complete input→output pairs
Output format — rigidly defined (JSON schema, config syntax)
Low temperature — 0.1 for structured output, 0.3 for explanations
Adding a New Domain
Create
modelfiles/Modelfile.my-domainRun
ollama create my-domain-expert:7b -f Modelfile.my-domainSet
OLLAMA_MODEL_MY_DOMAIN=my-domain-expert:7bThe router picks it up automatically — no code changes needed
Model Size Guidance
Size | Speed | When to Use |
3B | ~80 tok/s | Too small for most tasks |
7B | ~42 tok/s | Structured config generation, parsing (recommended) |
14B | ~21 tok/s | Domain questions, complex reasoning |
32B | ~10 tok/s | Only if 7B quality is insufficient |
For structured output with good system prompts, 7B matches 32B quality.
Token Savings Strategy
The biggest wins come from these patterns:
Query building — Local expert builds API queries instead of the orchestrator guessing
Context compression — Reduce 2KB API responses to 400B before the orchestrator reasons about them
State summarization — Pass/fail signals instead of raw output parsing
Config generation — The most token-intensive task, fully offloaded
Typical savings: 15-25K tokens per complex workflow run.
File Layout
mcp-servers/ollama-mcp/
├── server.py # MCP server (10 tools, stdio transport)
├── router.py # Domain → model routing (env-var driven)
├── ollama_client.py # Async Ollama HTTP client
├── models.py # Pydantic request/response schemas
├── metrics.py # Token savings tracker
├── requirements.txt # Dependencies: mcp, httpx, pydantic
└── modelfiles/ # Example Ollama Modelfiles
├── Modelfile.example-ospf
├── Modelfile.example-state-summarizer
└── Modelfile.example-graphql-builderRequirements
Python 3.10+
Ollama running somewhere accessible (local or remote)
A base model pulled (e.g.,
qwen2.5-coder:7b)Dependencies:
mcp,httpx,pydantic
License
BSL-1.1 (same as parent project)
This server cannot be installed
Maintenance
Resources
Unclaimed servers have limited discoverability.
Looking for Admin?
If you are the server author, to access and configure the admin panel.
Latest Blog Posts
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/byrn-baker/ollama-domain-experts-mcp'
If you have feedback or need assistance with the MCP directory API, please join our Discord server