HydraMCP is an MCP server that connects Claude Code to multiple AI models (GPT, Gemini, Claude, local Ollama models) for querying, comparing, and synthesizing responses.
list_models: Discover all available models across providers (OpenAI, Google, Anthropic, Ollama, subscription CLIs).ask_model: Query any single model with configurable parameters (temperature, max tokens, system prompt, response format).compare_models: Send the same prompt to 2–5 models in parallel and get side-by-side comparisons with latency and token metrics.consensus: Poll 3–7 models and aggregate responses using voting strategies (majority/supermajority/unanimous), with an optional LLM judge.synthesize: Query 2–5 models and combine their best ideas into a single unified answer via an optional synthesizer model.
Underlying features include circuit breakers for reliability, response caching, metrics tracking, and response distillation for efficiency.
Integrates Google's model ecosystem, allowing agents to query and compare various Google-hosted models alongside other providers.
Allows agents to query Gemini models via existing subscriptions for parallel comparisons and multi-model consensus workflows.
Leverages local models via Ollama for fast, quota-free operations like judging consensus between cloud providers and synthesizing multi-model outputs.
Connects to OpenAI models using existing subscriptions to enable parallel querying, model comparisons, and response synthesis.
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@HydraMCPcompare gpt-4 and claude on the best way to optimize this SQL query"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
An MCP server that lets Claude Code query any LLM — compare, vote, and synthesize across GPT, Gemini, Claude, and local models from one terminal.
Quick Start
That's it. The wizard walks you through everything — API keys, subscriptions, local models. At the end it gives you the one-liner to add to Claude Code.
Or if you already have API keys:
What It Looks Like
Four models, four ecosystems, one prompt. Real output from a live session:
All four independently found the same async bug. Then each one caught something different the others missed.
And this is consensus with a local judge:
Three cloud models polled, local model judging them. 686ms to evaluate agreement.
Tools
Tool | What It Does |
list_models | See what's available across all providers |
ask_model | Query any model, optional response distillation |
compare_models | Same prompt to 2-5 models in parallel |
consensus | Poll 3-7 models, LLM-as-judge evaluates agreement |
synthesize | Combine best ideas from multiple models into one answer |
analyze_file | Offload file analysis to a worker model |
smart_read | Extract specific code sections without reading the whole file |
session_recap | Restore context from previous Claude Code sessions |
From inside Claude Code, just say things like:
"ask gpt-5 to review this function"
"compare gemini and claude on this approach"
"get consensus from 3 models on whether this is thread safe"
"synthesize responses from all models on how to design this API"
How It Works
Three Ways to Connect Models
API Keys (fastest setup)
Set environment variables. HydraMCP auto-detects them.
Variable | Provider |
| OpenAI (GPT-4o, GPT-5, o3, etc.) |
| Google Gemini (2.5 Flash, Pro, etc.) |
| Anthropic Claude (Opus, Sonnet, Haiku) |
Subscriptions (use your monthly plan)
Already paying for ChatGPT Plus, Claude Pro, or Gemini Advanced? HydraMCP wraps the CLI tools those subscriptions include. No API billing.
The setup wizard detects which CLIs you have, installs missing ones, and walks you through authentication. Each CLI authenticates via browser once — then it's stored forever.
Subscription | CLI Tool | What You Get |
Gemini Advanced |
| Gemini 2.5 Flash, Pro, etc. |
Claude Pro/Max |
| Claude Opus, Sonnet, Haiku |
ChatGPT Plus/Pro |
| GPT-5, o3, Codex models |
Local Models
Install Ollama, pull a model, done. Auto-detected.
Mix and Match
All three methods stack. Use API keys for some providers, subscriptions for others, and Ollama for local. They all show up in list_models together.
Route explicitly with prefixes:
openai/gpt-5— force OpenAI APIgoogle/gemini-2.5-flash— force Google APIsub/gemini-2.5-flash— force subscription CLIollama/qwen2.5-coder:14b— force localgpt-5— auto-detect (tries each provider)
Setup Details
Option A: npx (recommended)
Config is saved to ~/.hydramcp/.env and persists across npx runs.
Option B: Clone
Verify
Restart Claude Code and say "list models". You should see everything you configured.
Architecture
HydraMCP wraps all providers in a SmartProvider layer that adds:
Circuit breaker — per-model failure tracking. After 3 failures, the model is disabled for 60s and auto-recovers.
Response cache — SHA-256 keyed, 15-minute TTL. Identical queries are served instantly.
Metrics — per-model query counts, latency, token usage, cache hit rates.
Response distillation — set
max_response_tokenson any query and a cheap model compresses the response while preserving code, errors, and specifics.
Contributing
Want to add a provider? The interface is three methods:
See src/providers/ollama.ts for a working example. Implement it, register in src/index.ts, done.
Providers we'd love to see: LM Studio, OpenRouter, Groq, Together AI, or anything that speaks HTTP.
License
MIT