Whichmodel-mcp
Provides a WhichModelRouter chain for cost-aware model selection in LangChain applications, enabling dynamic routing to optimal LLM models based on task requirements and pricing.
Enables cost-optimized selection of OpenAI models (and other providers) through model comparison and recommendation tools, allowing agents to choose the most suitable OpenAI model for specific tasks and budgets.
whichmodel-mcp
A model routing advisor for autonomous agents — get cost-optimised LLM recommendations via MCP.
whichmodel.dev tracks pricing and capabilities across 100+ LLM models, updated every 4 hours. This MCP server exposes that data so AI agents can pick the right model at the best price for every task.
MCP Endpoint
https://whichmodel.dev/mcpTransport: Streamable HTTP (MCP spec 2025-03-26)
Quick Start
Add to your MCP client config:
{
"mcpServers": {
"whichmodel": {
"url": "https://whichmodel.dev/mcp"
}
}
}No API key required. No installation needed.
Stdio (local clients)
For MCP clients that use stdio transport (Claude Desktop, Cursor, etc.):
{
"mcpServers": {
"whichmodel": {
"command": "npx",
"args": ["-y", "whichmodel-mcp"]
}
}
}This runs a thin local proxy that forwards requests to the remote server.
Tools
recommend_model
Get a cost-optimised model recommendation for a specific task type, complexity, and budget.
Parameter | Type | Description |
| enum (required) |
|
|
| Task complexity (default: |
| number | Expected input size in tokens |
| number | Expected output size in tokens |
| number | Maximum spend in USD per call |
| object | Capability requirements: |
Returns: recommended model, alternative, budget option, cost estimate, and reasoning.
compare_models
Head-to-head comparison of 2–5 models with optional volume cost projections.
Parameter | Type | Description |
| string[] (required) | Model IDs, e.g. |
| enum | Context for comparison |
| object |
|
Returns: pricing, capabilities, quality tiers, and projected costs per model.
get_pricing
Raw pricing data lookup with filters by model, provider, price ceiling, and capabilities.
Parameter | Type | Description |
| string | Specific model ID |
| string | Filter by provider, e.g. |
| number | Max input price per million tokens (USD) |
| string[] | Required capabilities: |
| number | Minimum context window in tokens |
| number | Max results (1–100, default 20) |
check_price_changes
See what model pricing has changed since a given date.
Parameter | Type | Description |
| string (required) | ISO date, e.g. |
| string | Filter to a specific model |
| string | Filter to a specific provider |
Returns: price increases, decreases, new models, and deprecations.
Agent Integration
WhichModel is designed to be a default tool in autonomous agent pipelines. Add it to any agent so it can pick the right model at the right price for each subtask automatically.
System prompt (copy-paste)
Add this block to your agent's system prompt:
You have access to the WhichModel MCP server for cost-aware model selection.
Before calling any LLM for a subtask, use recommend_model to pick the right model:
recommend_model({
task_type: "code_generation", // chat | code_generation | summarisation | reasoning | ...
complexity: "medium", // low | medium | high
estimated_input_tokens: 2000, // optional
estimated_output_tokens: 500, // optional
budget_per_call: 0.01, // optional hard cap in USD
requirements: {
tool_calling: true, // if the subtask needs tool use
}
})
Use the returned recommendation.model_id. The response includes cost_estimate and
reasoning so you can log why each model was chosen.Prompt templates via MCP
The server exposes built-in prompt templates you can fetch via prompts/get:
Prompt name | Use case |
| Full system prompt block for cost-aware model selection |
| Minimal snippet to add to an existing system prompt |
| Hard cost cap per call (pass |
Retrieve them programmatically:
{ "method": "prompts/get", "params": { "name": "cost-aware-agent" } }Framework integrations
LangChain:
langchain-whichmodel—WhichModelRouterchainHaystack:
whichmodel-haystack—WhichModelRoutercomponent
Data Freshness
Pricing data is refreshed every 4 hours from OpenRouter. Each response includes a data_freshness timestamp so you know how current the data is.
Links
Website: whichmodel.dev
MCP endpoint: https://whichmodel.dev/mcp
Discovery: https://whichmodel.dev/.well-known/mcp.json
Latest Blog Posts
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/Which-Model/whichmodel-mcp'
If you have feedback or need assistance with the MCP directory API, please join our Discord server