Which integrations are available for this server?

Provides tools for delegating domain-specific tasks to local Ollama models, enabling AI agents to generate configurations, validate designs, query domain experts, build GraphQL queries, compress context, and more using locally-running LLMs.

How do I use Ollama MCP Domain Expert Delegation?

1. Click on "Install Server". 2. Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state. 3. In the chat, type @ followed by the MCP server name and your instructions, e.g., "@Ollama MCP Domain Expert Delegation generate OSPF config for new segment 10.0.1.0/24" That's it! The server will respond to your query, and you can continue using it as needed. Here is a step-by-step guide with screenshots.

Ollama MCP Domain Expert Delegation

by byrn-baker

Overview Schema Related Servers Score Discussions

Python

Local

Ollama MCP — Local LLM Domain Expert Delegation

An MCP server that lets your orchestrating AI model (Claude, GPT, Qwen, DeepSeek, etc.) delegate domain-specific tasks to local Ollama models running on your own GPU. Instead of one model doing everything, purpose-built specialists handle structured tasks while the orchestrator focuses on planning and user interaction.

Why

Running AI agents with dozens of tools and complex multi-step workflows burns through cloud LLM tokens fast. Many of those tokens go to structured tasks that don't need frontier-level reasoning:

Generating config from structured data (template-filling with rules)
Parsing show command output (pattern matching)
Building API queries (schema mapping)
Validating configs against a source of truth (checklist evaluation)

These tasks are ideal for small local models (7B) with baked-in system prompts. The expertise lives in the prompt, not the model weights.

Related MCP server: mcp-ollama-python

Architecture

┌─────────────────────────────────────────┐
│ Orchestrating Model (Claude, etc.)      │
│ Plans, decides, interacts with user     │
└──────────┬──────────────────────────────┘
           │ MCP tool calls
           ▼
┌─────────────────────────────────────────┐
│ ollama-mcp (this server)                │
│ Routes by domain → local expert model   │
└──────────┬──────────────────────────────┘
           │ HTTP API
           ▼
┌─────────────────────────────────────────┐
│ Ollama Instance (local GPU)             │
│ ┌─────────┐ ┌─────────┐ ┌─────────┐   │
│ │ ospf:7b │ │ bgp:7b  │ │ frr:7b  │   │
│ └─────────┘ └─────────┘ └─────────┘   │
│ Same base weights, different prompts    │
└─────────────────────────────────────────┘

All expert "models" share the same base weights (e.g., qwen2.5-coder:7b). Only their system prompts differ. Ollama deduplicates shared layers on disk.

Tools Provided

Tool	Purpose
`ollama_generate_config`	Delegate config generation to a domain expert
`ollama_validate_design`	Validate a network design against RFCs
`ollama_domain_query`	Ask a domain expert a technical question
`ollama_validate_config_against_sot`	Validate config matches source-of-truth intent
`ollama_build_graphql_query`	Build GraphQL queries from natural language
`ollama_summarize_state`	Compress show command output to JSON digest
`ollama_compress_context`	Reduce large API responses to task-relevant JSON
`ollama_list_experts`	List configured experts and availability
`ollama_health_check`	Check Ollama connectivity
`ollama_delegation_stats`	Show token savings metrics

Quick Start

1. Install Ollama and pull a base model

# On your GPU machine
ollama pull qwen2.5-coder:7b

2. Create domain expert models

cd mcp-servers/ollama-mcp/modelfiles/

# Use the examples as starting points
cp Modelfile.example-ospf Modelfile.my-ospf
# Edit the system prompt for your topology/rules
ollama create my-ospf-expert:7b -f Modelfile.my-ospf

3. Configure environment

export OLLAMA_BASE_URL=http://localhost:11434
export OLLAMA_TIMEOUT=60
export OLLAMA_MODEL_OSPF=my-ospf-expert:7b
export OLLAMA_MODEL_BGP=my-bgp-expert:7b
export OLLAMA_MODEL_GENERAL=qwen2.5-coder:7b
export OLLAMA_MODEL_FALLBACK=qwen2.5-coder:7b

4. Run the MCP server

cd mcp-servers/ollama-mcp/
pip install -r requirements.txt
python server.py

5. Add to your OpenClaw/agent config

{
  "ollama-mcp": {
    "command": "python3",
    "args": ["-u", "mcp-servers/ollama-mcp/server.py"],
    "env": {
      "OLLAMA_BASE_URL": "http://localhost:11434",
      "OLLAMA_TIMEOUT": "60",
      "OLLAMA_MODEL_OSPF": "my-ospf-expert:7b",
      "OLLAMA_MODEL_GENERAL": "qwen2.5-coder:7b",
      "OLLAMA_MODEL_FALLBACK": "qwen2.5-coder:7b"
    }
  }
}

Creating Custom Domain Experts

The "expert" is just a base model + system prompt. No training required.

Modelfile Structure

FROM qwen2.5-coder:7b           ← base model (any Ollama model)
PARAMETER temperature 0.1        ← low = deterministic output
PARAMETER num_predict 4096       ← max output tokens
SYSTEM """
Your domain-specific rules, examples, and output format here.
"""

What Makes a Good Expert

Narrow scope — handle one specific task type well
Explicit rules — "NEVER do X", "ALWAYS do Y" with ❌ markers
Worked examples — complete input→output pairs
Output format — rigidly defined (JSON schema, config syntax)
Low temperature — 0.1 for structured output, 0.3 for explanations

Adding a New Domain

Create modelfiles/Modelfile.my-domain
Run ollama create my-domain-expert:7b -f Modelfile.my-domain
Set OLLAMA_MODEL_MY_DOMAIN=my-domain-expert:7b
The router picks it up automatically — no code changes needed

Model Size Guidance

Size	Speed	When to Use
3B	~80 tok/s	Too small for most tasks
7B	~42 tok/s	Structured config generation, parsing (recommended)
14B	~21 tok/s	Domain questions, complex reasoning
32B	~10 tok/s	Only if 7B quality is insufficient

For structured output with good system prompts, 7B matches 32B quality.

Token Savings Strategy

The biggest wins come from these patterns:

Query building — Local expert builds API queries instead of the orchestrator guessing
Context compression — Reduce 2KB API responses to 400B before the orchestrator reasons about them
State summarization — Pass/fail signals instead of raw output parsing
Config generation — The most token-intensive task, fully offloaded

Typical savings: 15-25K tokens per complex workflow run.

File Layout

mcp-servers/ollama-mcp/
├── server.py              # MCP server (10 tools, stdio transport)
├── router.py              # Domain → model routing (env-var driven)
├── ollama_client.py       # Async Ollama HTTP client
├── models.py              # Pydantic request/response schemas
├── metrics.py             # Token savings tracker
├── requirements.txt       # Dependencies: mcp, httpx, pydantic
└── modelfiles/            # Example Ollama Modelfiles
    ├── Modelfile.example-ospf
    ├── Modelfile.example-state-summarizer
    └── Modelfile.example-graphql-builder

Requirements

Python 3.10+
Ollama running somewhere accessible (local or remote)
A base model pulled (e.g., qwen2.5-coder:7b)
Dependencies: mcp, httpx, pydantic

License

BSL-1.1 (same as parent project)

This server cannot be installed

license - not found

quality - not tested

maintenance

How are these scores calculated?

Maintenance

–Maintainers

–Response time

–Release cycle

–Releases (12mo)

Commit activity

Resources

GitHub Repository

Need Help?

Related Servers

Unclaimed servers have limited discoverability.

Looking for Admin?

If you are the server author, to access and configure the admin panel.

Latest Blog Posts

Lightport: Open-Sourcing Glama's AI Gateway
By punkpeye on April 27, 2026.
open source
OpenAI
Tool Definition Quality Score (TDQS)
By punkpeye on April 3, 2026.
mcp
The Hackers Who Tracked My Sleep Cycle
By punkpeye on March 26, 2026.
security

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/byrn-baker/ollama-domain-experts-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server