Which integrations are available for this server?

Provides tools for delegating domain-specific tasks to local Ollama models, enabling AI agents to generate configurations, validate designs, query domain experts, build GraphQL queries, compress context, and more using locally-running LLMs.

How do I use Ollama MCP Domain Expert Delegation?

1. Click on "Install Server". 2. Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state. 3. In the chat, type @ followed by the MCP server name and your instructions, e.g., "@Ollama MCP Domain Expert Delegation generate OSPF config for new segment 10.0.1.0/24" That's it! The server will respond to your query, and you can continue using it as needed. Here is a step-by-step guide with screenshots.

Ollama MCP Domain Expert Delegation

by byrn-baker

Overview Schema Related Servers Score Discussions

Python

Local

Ollama MCP — Local LLM Domain Expert Delegation

An MCP server that lets your orchestrating AI model (Claude, GPT, Qwen, DeepSeek, etc.) delegate domain-specific tasks to local Ollama models running on your own GPU. Instead of one model doing everything, purpose-built specialists handle structured tasks while the orchestrator focuses on planning and user interaction.

Why

Running AI agents with dozens of tools and complex multi-step workflows burns through cloud LLM tokens fast. Many of those tokens go to structured tasks that don't need frontier-level reasoning:

Generating config from structured data (template-filling with rules)
Parsing show command output (pattern matching)
Building API queries (schema mapping)
Validating configs against a source of truth (checklist evaluation)

These tasks are ideal for small local models (7B) with baked-in system prompts. The expertise lives in the prompt, not the model weights.

Related MCP server: mcp-ollama-python

Architecture

┌─────────────────────────────────────────┐
│ Orchestrating Model (Claude, etc.)      │
│ Plans, decides, interacts with user     │
└──────────┬──────────────────────────────┘
           │ MCP tool calls
           ▼
┌─────────────────────────────────────────┐
│ ollama-mcp (this server)                │
│ Domain Router + Health Checker          │
│ Routes by domain → provider + model     │
└──────────┬──────────┬──────────┬────────┘
           │          │          │
           ▼          ▼          ▼
┌──────────────┐ ┌─────────┐ ┌──────────────┐
│ Ollama Local │ │ Ollama  │ │ OpenAI-compat│
│ (your GPU)   │ │ Cloud   │ │ (Groq, vLLM) │
└──────────────┘ └─────────┘ └──────────────┘

The server supports multiple inference backends simultaneously. Domain routing, health checks, and automatic failover are all configured via environment variables. Legacy single-Ollama setups continue to work unchanged.

Tools Provided

Tool	Purpose
`ollama_generate_config`	Delegate config generation to a domain expert
`ollama_validate_design`	Validate a network design against RFCs
`ollama_domain_query`	Ask a domain expert a technical question
`ollama_validate_config_against_sot`	Validate config matches source-of-truth intent
`ollama_build_graphql_query`	Build GraphQL queries from natural language
`ollama_summarize_state`	Compress show command output to JSON digest
`ollama_compress_context`	Reduce large API responses to task-relevant JSON
`ollama_list_experts`	List configured experts and availability
`ollama_health_check`	Check Ollama connectivity
`ollama_delegation_stats`	Show token savings metrics

Quick Start

1. Install Ollama and pull a base model

# On your GPU machine
ollama pull qwen2.5-coder:7b

2. Create domain expert models

cd mcp-servers/ollama-mcp/modelfiles/

# Use the examples as starting points
cp Modelfile.example-ospf Modelfile.my-ospf
# Edit the system prompt for your topology/rules
ollama create my-ospf-expert:7b -f Modelfile.my-ospf

3. Configure environment

export OLLAMA_BASE_URL=http://localhost:11434
export OLLAMA_TIMEOUT=60
export OLLAMA_MODEL_OSPF=my-ospf-expert:7b
export OLLAMA_MODEL_BGP=my-bgp-expert:7b
export OLLAMA_MODEL_GENERAL=qwen2.5-coder:7b
export OLLAMA_MODEL_FALLBACK=qwen2.5-coder:7b

4. Run the MCP server

cd mcp-servers/ollama-mcp/
pip install -r requirements.txt
python server.py

5. Add to your OpenClaw/agent config

{
  "ollama-mcp": {
    "command": "python3",
    "args": ["-u", "mcp-servers/ollama-mcp/server.py"],
    "env": {
      "OLLAMA_BASE_URL": "http://localhost:11434",
      "OLLAMA_TIMEOUT": "60",
      "OLLAMA_MODEL_OSPF": "my-ospf-expert:7b",
      "OLLAMA_MODEL_GENERAL": "qwen2.5-coder:7b",
      "OLLAMA_MODEL_FALLBACK": "qwen2.5-coder:7b"
    }
  }
}

Multi-Provider Configuration (New)

The new multi-provider system lets you define multiple inference backends and route domains independently. Providers are defined separately from routing — this keeps configuration modular and easy to reason about.

Provider Setup

Providers are configured with PROVIDER_* environment variables. The server discovers them automatically at startup.

Ollama Local (your GPU box)

PROVIDER_OLLAMA_LOCAL_URL=http://192.168.1.50:11434

Ollama Cloud (authenticated Ollama API)

PROVIDER_OLLAMA_CLOUD_URL=https://cloud.ollama.com
PROVIDER_OLLAMA_CLOUD_API_KEY=sk-your-key-here

OpenAI-Compatible (vLLM, Together, Groq, OpenRouter)

Use the pattern PROVIDER_OPENAI_<NAME>_URL and PROVIDER_OPENAI_<NAME>_API_KEY where <NAME> is any identifier you choose:

# Groq
PROVIDER_OPENAI_GROQ_URL=https://api.groq.com/openai
PROVIDER_OPENAI_GROQ_API_KEY=gsk_your-key

# vLLM on your cluster
PROVIDER_OPENAI_VLLM_URL=http://10.0.0.5:8000
PROVIDER_OPENAI_VLLM_API_KEY=token-abc123

# Together AI
PROVIDER_OPENAI_TOGETHER_URL=https://api.together.xyz
PROVIDER_OPENAI_TOGETHER_API_KEY=tok_your-key

Each OpenAI-compatible provider gets a derived ID: openai-groq, openai-vllm, openai-together.

Domain Routing

Routes map domains to providers and models using ROUTE_* environment variables.

# Route OSPF tasks to local Ollama
ROUTE_OSPF_PROVIDER=ollama-local
ROUTE_OSPF_MODEL=my-ospf-expert:7b
ROUTE_OSPF_TEMPERATURE=0.1
ROUTE_OSPF_MAX_TOKENS=4096

# Route BGP tasks to Groq for speed
ROUTE_BGP_PROVIDER=openai-groq
ROUTE_BGP_MODEL=llama-3.3-70b-versatile
ROUTE_BGP_TEMPERATURE=0.1

# Route GraphQL tasks to local with a system prompt file
ROUTE_GRAPHQL_PROVIDER=ollama-local
ROUTE_GRAPHQL_MODEL=qwen2.5-coder:7b
ROUTE_GRAPHQL_SYSTEM_PROMPT_FILE=./prompts/graphql.txt

# Default provider for domains without explicit routes
ROUTE_DEFAULT_PROVIDER=ollama-local

Per-Domain Options

Variable	Purpose	Default
`ROUTE_<DOMAIN>_PROVIDER`	Provider ID to use	`ROUTE_DEFAULT_PROVIDER`
`ROUTE_<DOMAIN>_MODEL`	Model name	—
`ROUTE_<DOMAIN>_TEMPERATURE`	Generation temperature	`0.1`
`ROUTE_<DOMAIN>_TOP_P`	Top-p sampling	`0.9`
`ROUTE_<DOMAIN>_MAX_TOKENS`	Max output tokens	`4096`
`ROUTE_<DOMAIN>_SYSTEM_PROMPT`	Inline system prompt	—
`ROUTE_<DOMAIN>_SYSTEM_PROMPT_FILE`	File path for system prompt	—
`ROUTE_<DOMAIN>_FALLBACK`	Comma-separated fallback provider IDs	—

Health Checks and Fallback

The server runs background health probes against all providers. When a primary provider goes down, requests automatically route to the next healthy provider in the fallback chain.

How it works:

Every 30 seconds (configurable), each provider's is_reachable endpoint is probed
After 2 consecutive failures (configurable), the provider is marked unhealthy
A single successful probe restores healthy status
When a domain's primary provider is unhealthy, the router walks its fallback chain

Configuration:

HEALTH_CHECK_INTERVAL=30           # Probe interval in seconds
HEALTH_FAILURE_THRESHOLD=2         # Consecutive failures before marking unhealthy

Fallback chain example:

# OSPF primary is local, falls back to cloud, then Groq
ROUTE_OSPF_PROVIDER=ollama-local
ROUTE_OSPF_FALLBACK=ollama-cloud,openai-groq

Resolution order:

Primary provider (if healthy)
Each provider in the fallback chain (first healthy one wins)
ROUTE_DEFAULT_PROVIDER (last resort)
Graceful degradation response (NO_PROVIDER_AVAILABLE) — the orchestrating agent handles the task directly

Migration from Legacy Config

Your existing OLLAMA_* environment variables continue to work. The server auto-detects legacy mode and synthesizes equivalent new-style configuration internally.

Legacy mode activates when: OLLAMA_MODEL_* vars are present AND no PROVIDER_* vars are set.

Mapping rules:

Legacy Variable	New-Style Equivalent
`OLLAMA_BASE_URL`	`PROVIDER_OLLAMA_LOCAL_URL`
`OLLAMA_MODEL_<DOMAIN>`	`ROUTE_<DOMAIN>_MODEL` + `ROUTE_<DOMAIN>_PROVIDER=ollama-local`
`OLLAMA_TEMP_<DOMAIN>`	`ROUTE_<DOMAIN>_TEMPERATURE`
`OLLAMA_MODEL_FALLBACK`	`ROUTE_DEFAULT_PROVIDER=ollama-local` + `ROUTE_DEFAULT_MODEL`

When both styles are present: New-style PROVIDER_*/ROUTE_* vars take precedence. Legacy vars are ignored and a deprecation warning is logged.

Recommended migration steps:

Keep your existing setup running (legacy vars still work)
Add PROVIDER_OLLAMA_LOCAL_URL pointing to your Ollama instance
Convert each OLLAMA_MODEL_<DOMAIN> to ROUTE_<DOMAIN>_PROVIDER + ROUTE_<DOMAIN>_MODEL
Remove the old OLLAMA_* vars once everything checks out
Optionally add cloud or OpenAI-compatible providers for fallback

Example .env Configurations

Local-Only (single Ollama instance)

# Provider
PROVIDER_OLLAMA_LOCAL_URL=http://localhost:11434

# Routing
ROUTE_OSPF_PROVIDER=ollama-local
ROUTE_OSPF_MODEL=my-ospf-expert:7b
ROUTE_BGP_PROVIDER=ollama-local
ROUTE_BGP_MODEL=my-bgp-expert:7b
ROUTE_DEFAULT_PROVIDER=ollama-local

Local + Cloud Fallback

# Providers
PROVIDER_OLLAMA_LOCAL_URL=http://192.168.1.50:11434
PROVIDER_OLLAMA_CLOUD_URL=https://cloud.ollama.com
PROVIDER_OLLAMA_CLOUD_API_KEY=sk-your-key

# Routing with fallback
ROUTE_OSPF_PROVIDER=ollama-local
ROUTE_OSPF_MODEL=my-ospf-expert:7b
ROUTE_OSPF_FALLBACK=ollama-cloud

ROUTE_BGP_PROVIDER=ollama-local
ROUTE_BGP_MODEL=my-bgp-expert:7b
ROUTE_BGP_FALLBACK=ollama-cloud

ROUTE_DEFAULT_PROVIDER=ollama-local

Multi-Provider (local GPU + Groq + Together)

# Providers
PROVIDER_OLLAMA_LOCAL_URL=http://192.168.1.50:11434
PROVIDER_OPENAI_GROQ_URL=https://api.groq.com/openai
PROVIDER_OPENAI_GROQ_API_KEY=gsk_your-key
PROVIDER_OPENAI_TOGETHER_URL=https://api.together.xyz
PROVIDER_OPENAI_TOGETHER_API_KEY=tok_your-key

# Heavy structured tasks → local GPU (free)
ROUTE_OSPF_PROVIDER=ollama-local
ROUTE_OSPF_MODEL=my-ospf-expert:7b
ROUTE_OSPF_FALLBACK=openai-groq

# Fast turnaround tasks → Groq
ROUTE_BGP_PROVIDER=openai-groq
ROUTE_BGP_MODEL=llama-3.3-70b-versatile
ROUTE_BGP_FALLBACK=ollama-local,openai-together

# Complex reasoning → Together (larger models)
ROUTE_GENERAL_PROVIDER=openai-together
ROUTE_GENERAL_MODEL=meta-llama/Llama-3-70b-chat-hf
ROUTE_GENERAL_FALLBACK=openai-groq,ollama-local

ROUTE_DEFAULT_PROVIDER=ollama-local

# Health tuning
HEALTH_CHECK_INTERVAL=30
HEALTH_FAILURE_THRESHOLD=2

Creating Custom Domain Experts

The "expert" is just a base model + system prompt. No training required.

Modelfile Structure

FROM qwen2.5-coder:7b           ← base model (any Ollama model)
PARAMETER temperature 0.1        ← low = deterministic output
PARAMETER num_predict 4096       ← max output tokens
SYSTEM """
Your domain-specific rules, examples, and output format here.
"""

What Makes a Good Expert

Narrow scope — handle one specific task type well
Explicit rules — "NEVER do X", "ALWAYS do Y" with ❌ markers
Worked examples — complete input→output pairs
Output format — rigidly defined (JSON schema, config syntax)
Low temperature — 0.1 for structured output, 0.3 for explanations

Adding a New Domain

Create modelfiles/Modelfile.my-domain
Run ollama create my-domain-expert:7b -f Modelfile.my-domain
Set OLLAMA_MODEL_MY_DOMAIN=my-domain-expert:7b
The router picks it up automatically — no code changes needed

Model Size Guidance

Size	Speed	When to Use
3B	~80 tok/s	Too small for most tasks
7B	~42 tok/s	Structured config generation, parsing (recommended)
14B	~21 tok/s	Domain questions, complex reasoning
32B	~10 tok/s	Only if 7B quality is insufficient

For structured output with good system prompts, 7B matches 32B quality.

Token Savings Strategy

The biggest wins come from these patterns:

Query building — Local expert builds API queries instead of the orchestrator guessing
Context compression — Reduce 2KB API responses to 400B before the orchestrator reasons about them
State summarization — Pass/fail signals instead of raw output parsing
Config generation — The most token-intensive task, fully offloaded

Typical savings: 15-25K tokens per complex workflow run.

File Layout

mcp-servers/ollama-mcp/
├── server.py              # MCP server (10 tools, stdio transport)
├── routing.py             # Domain → provider routing (health-aware fallback)
├── health.py              # Async background health probes
├── compat.py              # Legacy OLLAMA_* env var compatibility layer
├── metrics.py             # Token savings + per-provider metrics tracker
├── models.py              # Pydantic request/response schemas
├── providers/             # Provider abstraction layer
│   ├── __init__.py        # Exports ProviderClient, ProviderResponse, GenerationOptions
│   ├── base.py            # Abstract base class + dataclasses
│   ├── ollama_local.py    # Ollama Local provider (HTTP API)
│   ├── ollama_cloud.py    # Ollama Cloud provider (authenticated)
│   ├── openai_compat.py   # OpenAI-compatible provider (vLLM, Groq, etc.)
│   └── registry.py        # Provider discovery from PROVIDER_* env vars
├── router.py              # [Deprecated] Old domain router (redirects to routing.py)
├── ollama_client.py       # [Deprecated] Old Ollama HTTP client
├── requirements.txt       # Dependencies: mcp, httpx, pydantic, hypothesis
└── modelfiles/            # Example Ollama Modelfiles
    ├── Modelfile.example-ospf
    ├── Modelfile.example-state-summarizer
    └── Modelfile.example-graphql-builder

Requirements

Python 3.10+
Ollama running somewhere accessible (local or remote)
A base model pulled (e.g., qwen2.5-coder:7b)
Dependencies: mcp, httpx, pydantic

License

BSL-1.1 (same as parent project)

This server cannot be installed

license - not found

quality - not tested

maintenance

How are these scores calculated?

Maintenance

–Maintainers

–Response time

–Release cycle

–Releases (12mo)

Commit activity

Resources

GitHub Repository

Need Help?

Related Servers

Unclaimed servers have limited discoverability.

Looking for Admin?

If you are the server author, to access and configure the admin panel.

Latest Blog Posts

Your AI Chatbot Just Exposed Your CEO's Salary to an Intern
By Om-Shree-0709 on July 2, 2026.
Agent Identity
MCP Security
OAuth Delegation
Why MCP Servers Need Execution Sandboxing (And Why Your Current Stack Isn't Enough)
By Om-Shree-0709 on June 30, 2026.
Agentic Ai
Prompt Injection
WebAssembly
Lightport: Open-Sourcing Glama's AI Gateway
By punkpeye on April 27, 2026.
OpenAI
open source

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/byrn-baker/ollama-domain-experts-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server