llm-gateway-mcp
Routes prompts to Google Gemini's language models via API key, supporting models such as gemini-2.5-pro, gemini-2.5-flash, and gemini-2.5-flash-lite.
Routes prompts to OpenAI's language models, supporting models such as gpt-4o, gpt-4o-mini, and o3-mini.
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@llm-gateway-mcporchestrate a multi-role analysis of this article"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
llm-gateway-mcp
A small, self-hostable MCP server that routes prompts to multiple LLM providers by a declarative policy, with multi-role orchestration inspired by two Sakana AI papers. Clone it, drop in your own API keys, and point any MCP client at it.
The caller never picks a model. It names a
task_class; the gateway imposes the right model, applies cost/reliability policies, and (optionally) runs a plan → execute → verify pipeline across several models.
4 pluggable providers — OpenAI, Anthropic, Google Gemini (API key, not Vertex), DeepSeek. Each reads its own key from the environment.
Declarative routing (
routing.yaml) — model × provider × max_tokens per task_class, with cost preflight, circuit breaker, retry/backoff, and per-task fallback chains.Sakana-inspired orchestration that actually runs — independence certification, Thinker/Worker/Verifier roles, and a compose pipeline with controlled per-step visibility and failure-gated re-planning.
No keys? Still testable.
smoke_test.pyand thetests/suite mock every provider call, so the whole thing runs green offline.
The orchestration features (the interesting part)
These implement, in a domain-agnostic way, ideas from:
"TRINITY: An Evolved LLM Coordinator" — Sakana AI, arXiv:2512.04695, ICLR 2026
"Learning to Orchestrate Agents in Natural Language with the Conductor" — Sakana AI, arXiv:2512.04388, ICLR 2026
A key adaptation: the papers optimize synergy (workers reading each other to converge). For cross-checking we often want the opposite — independence — so that agreement between models is evidence, not an echo. This gateway keeps the two planes separate on purpose.
# | Feature | Where | What it does |
1 | Declarative routing by |
| model × provider × |
2 | Independence certification (Conductor T-02 access_list / visibility) |
| For blind parallel panels, proves each member saw only the original prompt and stamps |
3 | Thinker / Worker / Verifier roles (Trinity T-03) |
| Per-role instruction templates + a configurable role → model table. |
4 | compose: plan → execute → verify |
| One model plans, another executes, a third verifies. The verifier is blind to the plan and judges the artifact against the original task. One failure-gated re-plan (cap 1). |
5 | Depth by difficulty |
|
|
Plus the generic reliability policies: cost preflight + caps, circuit breaker, retry with backoff, and per-task fallback.
Related MCP server: agentloop
Architecture
llm-gateway-mcp/
├── server.py # FastMCP entry: llm_route, llm_orchestrate, llm_routing_info
├── routing.yaml # declarative policy (task_classes, panels, visibility, roles, depth)
├── providers/ # one adapter per provider, shared 3-state response envelope
│ ├── __init__.py # registry + dispatch()
│ ├── openai.py # OPENAI_API_KEY (httpx, Chat Completions)
│ ├── anthropic.py # ANTHROPIC_API_KEY (official anthropic SDK, Messages API)
│ ├── gemini.py # GEMINI_API_KEY (google-genai, API key — NOT Vertex)
│ └── deepseek.py # DEEPSEEK_API_KEY (httpx, OpenAI-compatible)
├── policies/ # generic, provider-agnostic
│ ├── cost_estimator.py # preflight max-cost projection
│ ├── cost_ledger.py # optional SQLite spend ledger + caps + kill switches
│ ├── circuit_breaker.py # per (provider, model) breaker
│ ├── retry_backoff.py # transient-only retry
│ ├── error_taxonomy.py # normalize provider errors
│ └── fallback.py # per-task_class fallback chains
├── orchestration/ # the Sakana-inspired layer
│ ├── independence.py # access_list / visibility certification
│ ├── roles.py # Thinker/Worker/Verifier + depth
│ └── compose.py # plan → execute → verify pipeline
├── smoke_test.py # offline structural test (mocks every model call)
└── tests/ # pytest suite (offline)Response envelope (every provider, every tool):
{ "status": "success",
"data": { "text": "..." },
"meta": { "provider": "...", "model": "...", "latency_ms": 0,
"tokens": {"input": 0, "output": 0, "total": 0},
"cost_usd_approx": 0.0, "task_class": "..." } }Install & run
git clone <your-fork-url> llm-gateway-mcp
cd llm-gateway-mcp
python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt # or: pip install -e ".[dev]"
cp .env.example .env # then put YOUR keys in .envVerify it works with no keys needed (everything is mocked):
python smoke_test.py # 9/9 PASS
pytest -q # 22 passedRun as an MCP server (stdio):
python server.pyRegister it with an MCP client (example mcp.json entry):
{
"mcpServers": {
"llm-gateway": {
"command": "python",
"args": ["/absolute/path/to/llm-gateway-mcp/server.py"]
}
}
}You only need keys for the providers your routing.yaml actually targets.
One key is enough to start — any single-model
task_classworks with just that provider's key. But the real value here is the cross-model orchestration (panels, blind triangulation, plan→execute→verify), which needs 2+ different providers to shine — agreement across independent models is only evidence if the models are actually different. So: 1 key works, 2 or more is recommended.
Tools
llm_route(task_class, prompt, system?, max_tokens?, override_model?)
Routes by task_class. Single-model classes return one answer; panel classes
(those with members: in routing.yaml, e.g. dual_opinion, triple_review)
run every member in parallel on the same original prompt and certify
independence in meta.
llm_route(task_class="general_reasoning", prompt="Plan a migration from X to Y.")
llm_route(task_class="triple_review", prompt="Is this argument sound? ...")
# -> data.members = [3 independent answers], meta.visibility.independence_certified = truellm_orchestrate(task, depth?)
Runs the plan → execute → verify pipeline. depth ∈ trivial | standard | complex.
llm_orchestrate(task="Draft a concise refund policy for a SaaS product.", depth="complex")
# -> data: { artifact, plan, verdict }, meta: { steps, rounds, visibility }llm_routing_info()
Returns the active policy: version, task_classes, providers, panels, visibility contracts, orchestration depths, and current circuit-breaker state.
Configuration
Everything routable lives in routing.yaml — edit it freely:
defaults.<task_class>→{provider, model, max_tokens}(ormembers:for a panel)cost_preflight→warn_usd/block_usdthresholdsvisibility.<task_class>→mode: blind,enforce: hard|softfallback.<task_class>→ ordered alternates (empty = no fallback)orchestration→roles, per-roleinstructions, anddepthtable
Optional spend controls (env, disabled by default — see .env.example):
LLM_GATEWAY_LEDGER, LLM_GATEWAY_CAP_TOTAL_MONTHLY,
LLM_GATEWAY_CAP_<PROVIDER>_MONTHLY, plus kill switches
LLM_GATEWAY_DISABLED and LLM_GATEWAY_EXPENSIVE_DISABLED.
⚠️ Pricing in each
providers/*.pyMODELStable is illustrative. Verify against each provider's live pricing before trusting cost preflight in production.
Providers at a glance
Provider | Env var | Transport | Example models |
OpenAI |
| httpx (Chat Completions) |
|
Anthropic |
|
|
|
Google Gemini |
|
|
|
DeepSeek |
| httpx (OpenAI-compatible) |
|
Adding a provider: drop a module in providers/ exposing MODELS and an
async def complete(messages, model, max_tokens, **kwargs) returning the shared
envelope, then register it in providers/__init__.py.
License
MIT © Felipe Márquez. See LICENSE.
Paper credits: TRINITY (arXiv:2512.04695) and Conductor (arXiv:2512.04388), Sakana AI, ICLR 2026. This project implements ideas from those papers in a generic form; it is not affiliated with or endorsed by Sakana AI.
This server cannot be installed
Maintenance
Resources
Unclaimed servers have limited discoverability.
Looking for Admin?
If you are the server author, to access and configure the admin panel.
Latest Blog Posts
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/felmarv/llm-gateway-mcp-public'
If you have feedback or need assistance with the MCP directory API, please join our Discord server