WhichModel
Server Details
Cost-optimized LLM model routing recommendations for autonomous AI agents
- Status
- Unhealthy
- Last Tested
- Transport
- Streamable HTTP
- URL
Glama MCP Gateway
Connect through Glama MCP Gateway for full control over tool access and complete visibility into every call.
Full call logging
Every tool call is logged with complete inputs and outputs, so you can debug issues and audit what your agents are doing.
Tool access control
Enable or disable individual tools per connector, so you decide what your agents can and cannot do.
Managed credentials
Glama handles OAuth flows, token storage, and automatic rotation, so credentials never expire on your clients.
Usage analytics
See which tools your agents call, how often, and when, so you can understand usage patterns and catch anomalies.
Tool Definition Quality
Average 4.2/5 across 6 of 6 tools scored.
Each tool has a clearly distinct purpose with no overlap: check_price_changes monitors historical pricing, compare_models compares specific models, estimate_cost projects workload costs, find_cheapest_capable filters by capabilities, get_pricing provides raw filtered data, and recommend_model offers task-based recommendations. The descriptions explicitly differentiate use cases and direct users to the appropriate tool.
All tool names follow a consistent verb_noun pattern using snake_case (e.g., check_price_changes, compare_models, estimate_cost). The naming is predictable and readable throughout, with no deviations in style or convention.
With 6 tools, the server is well-scoped for LLM model selection and pricing analysis. Each tool serves a unique function in the domain, covering monitoring, comparison, estimation, filtering, data retrieval, and recommendation without being overly sparse or bloated.
The toolset provides complete coverage for LLM model selection and cost management: it includes monitoring (check_price_changes), comparison (compare_models), cost estimation (estimate_cost), capability-based filtering (find_cheapest_capable), data lookup (get_pricing), and recommendation (recommend_model). There are no obvious gaps, and the tools work together to support end-to-end workflows.
Available Tools
6 toolscheck_price_changesARead-onlyIdempotentInspect
Returns all LLM pricing changes recorded since a given date, optionally filtered to a specific model or provider. Each change record includes the old price, new price, model ID, and change timestamp. The since parameter accepts ISO date format (YYYY-MM-DD or full ISO timestamp, e.g. "2026-04-01"). Returns an empty changes array when no changes are found in the period. Use to monitor cost drift, detect newly added or deprecated models, or build price-change alerts. Check total_changes in the response to distinguish empty results from errors.
| Name | Required | Description | Default |
|---|---|---|---|
| since | Yes | ISO date to check changes from, e.g. "2026-04-01" | |
| model_id | No | Filter to a specific model | |
| provider | No | Filter to a specific provider |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
The description adds valuable behavioral context beyond what annotations provide: it specifies that results include an empty changes array when no changes are found, explains how to distinguish empty results from errors using 'total_changes', and describes the return format (old price, new price, model ID, change timestamp). While annotations cover safety (readOnly, idempotent, non-destructive), the description enhances understanding of the tool's behavior in edge cases.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is efficiently structured with zero waste: the first sentence states the core functionality, subsequent sentences add crucial behavioral details (empty results handling, error distinction), and the final sentence provides usage context. Every sentence earns its place, and information is front-loaded appropriately.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's moderate complexity (3 parameters, no output schema), the description is quite complete: it explains purpose, parameters, return format, edge cases, and usage scenarios. The main gap is the lack of an output schema, but the description compensates by detailing the response structure. It could be slightly more complete by mentioning pagination or rate limits, but overall it's well-rounded.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
With 100% schema description coverage, the schema already documents all three parameters thoroughly. The description adds minimal value beyond the schema by mentioning the ISO date format for the 'since' parameter and the optional filtering capability, but doesn't provide additional semantic context about parameter interactions or constraints. This meets the baseline for high schema coverage.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the specific verb ('Returns') and resource ('all LLM pricing changes') with precise scope ('recorded since a given date, optionally filtered to a specific model or provider'). It distinguishes from siblings like 'get_pricing' (which likely returns current prices) and 'compare_models' (which likely compares models rather than tracking changes over time).
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides clear context for when to use this tool ('monitor cost drift, detect newly added or deprecated models, or build price-change alerts'), giving practical applications. However, it doesn't explicitly state when NOT to use it or name specific alternatives among the sibling tools, which prevents a perfect score.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
compare_modelsARead-onlyIdempotentInspect
Side-by-side comparison of 2–5 specific LLMs by pricing, quality tier, capabilities, and projected costs. Supply model IDs (e.g. "anthropic/claude-sonnet-4", "openai/gpt-4.1"); add an optional volume object to see daily and monthly cost estimates based on expected call volume and token sizes. Returns models sorted by value score, with a plain-English recommendation highlighting best value, cheapest, and highest-quality options. Unknown model IDs are reported in not_found without raising an error. Use when you already have specific candidates and want a structured diff. Do not use for open-ended model discovery — use recommend_model instead.
| Name | Required | Description | Default |
|---|---|---|---|
| models | Yes | Model IDs to compare, e.g. ["anthropic/claude-sonnet-4", "openai/gpt-4.1"] | |
| volume | No | Expected usage volume for cost projections | |
| task_type | No | Task type for context-aware comparison |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already declare readOnlyHint=true, idempotentHint=true, and destructiveHint=false, covering safety and idempotency. The description adds valuable behavioral context beyond annotations: it explains that unknown model IDs are reported without raising errors ('not_found'), describes the return format ('sorted by value score, with a plain-English recommendation'), and mentions the tool's response structure. No contradictions with annotations exist.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is efficiently structured in three sentences: the first explains the core functionality, the second details parameters and output, and the third provides usage guidelines. Every sentence adds essential information with zero wasted words, making it easy to parse and understand quickly.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's moderate complexity (3 parameters, nested objects) and rich annotations (readOnly, idempotent, non-destructive), the description is complete. It covers purpose, parameters, output behavior, error handling, and usage guidelines. Although there's no output schema, the description adequately explains the return format and recommendations, making it sufficient for agent use.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100%, so the schema fully documents all parameters. The description adds some context by mentioning 'model IDs (e.g., "anthropic/claude-sonnet-4", "openai/gpt-4.1")' and 'optional volume object for cost estimates,' but this mostly reiterates what the schema already specifies. No additional parameter semantics beyond the schema are provided.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool performs a 'side-by-side comparison of 2–5 specific LLMs' across pricing, quality, capabilities, and projected costs. It specifies the exact verb ('compare') and resource ('models'), and explicitly distinguishes it from the sibling 'recommend_model' tool by stating 'Do not use for open-ended model discovery — use recommend_model instead.'
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides explicit guidance on when to use this tool ('when you already have specific candidates and want a structured diff') and when not to use it ('Do not use for open-ended model discovery — use recommend_model instead'). It names the alternative sibling tool directly, giving clear context for selection.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
estimate_costARead-onlyIdempotentInspect
Estimate the cost of a specific workload for a given model. Returns cost per call, daily and monthly projections, and a comparison to the cheapest alternative with equivalent capabilities.
| Name | Required | Description | Default |
|---|---|---|---|
| model_id | Yes | Model ID to estimate cost for, e.g. "anthropic/claude-sonnet-4" | |
| input_tokens | Yes | Number of input tokens per call | |
| calls_per_day | No | Expected number of calls per day (for daily/monthly projections) | |
| output_tokens | Yes | Number of output tokens per call |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already indicate read-only, idempotent, and non-destructive behavior. The description adds valuable context beyond this: it specifies the return format (cost per call, daily/monthly projections, comparison to cheapest alternative), which helps the agent understand what to expect. No contradictions with annotations exist.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is a single, well-structured sentence that front-loads the purpose and efficiently lists all return components. Every word contributes to understanding the tool's function without redundancy or fluff.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's moderate complexity (cost estimation with projections), rich annotations (safety profile covered), and no output schema, the description is mostly complete. It clearly states what the tool returns, but could benefit from mentioning assumptions (e.g., based on current pricing) or limitations (e.g., does not account for discounts).
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100%, providing clear documentation for all parameters. The description does not add extra meaning beyond the schema, such as explaining token calculation methods or projection assumptions. Baseline score of 3 is appropriate as the schema carries the full burden.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the specific action ('Estimate the cost'), target resource ('a specific workload for a given model'), and output details ('cost per call, daily and monthly projections, and a comparison to the cheapest alternative'). It distinguishes from siblings like 'check_price_changes' (monitoring) or 'compare_models' (direct comparison) by focusing on estimation with projections.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description implies usage for cost estimation of workloads, but does not explicitly state when to use this tool versus alternatives like 'find_cheapest_capable' or 'get_pricing'. No exclusions or prerequisites are mentioned, leaving the agent to infer context from sibling names alone.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
find_cheapest_capableARead-onlyIdempotentInspect
Find the cheapest models that meet specific capability requirements. Useful when you have hard constraints (e.g. must support tool_calling + vision) and want the most cost-effective option.
| Name | Required | Description | Default |
|---|---|---|---|
| quality_floor | No | Minimum quality tier: "low" (budget+), "medium" (standard+), "high" (premium+), "frontier" (frontier only) | |
| min_context_window | No | Minimum context window size in tokens, e.g. 128000 | |
| required_capabilities | Yes | Capabilities the model must support, e.g. ["tool_calling", "json_output", "vision"] |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already declare readOnlyHint=true, idempotentHint=true, and destructiveHint=false, so the agent knows this is a safe, repeatable query operation. The description adds useful context about the tool's purpose (finding cheapest models meeting constraints) which isn't captured in annotations. However, it doesn't disclose additional behavioral traits like potential rate limits, authentication needs, or what happens when no models meet requirements.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is perfectly concise with just two sentences. The first sentence states the core purpose, and the second provides usage context. Every word earns its place with zero waste or redundancy. The structure is front-loaded with the most important information first.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
For a query tool with comprehensive annotations (readOnly, idempotent, non-destructive) and 100% schema coverage, the description provides adequate context. It explains the tool's purpose and when to use it. The main gap is the lack of output schema, so the description doesn't explain what the tool returns (e.g., list of models, pricing details, or just model names). However, given the tool's relative simplicity and good annotation coverage, this is mostly complete.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
With 100% schema description coverage, all parameters are well-documented in the schema itself. The description mentions 'specific capability requirements' which aligns with the 'required_capabilities' parameter, and 'hard constraints' which relates to all parameters, but doesn't add significant semantic meaning beyond what the schema already provides. The baseline of 3 is appropriate when the schema does the heavy lifting.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool's purpose: 'Find the cheapest models that meet specific capability requirements.' It specifies the verb ('find'), resource ('cheapest models'), and constraint ('meet specific capability requirements'). However, it doesn't explicitly differentiate from sibling tools like 'compare_models' or 'recommend_model' beyond mentioning cost-effectiveness.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides clear context for when to use this tool: 'Useful when you have hard constraints (e.g. must support tool_calling + vision) and want the most cost-effective option.' This gives practical guidance about the tool's intended use case. However, it doesn't explicitly mention when NOT to use it or name specific alternatives among the sibling tools.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
get_pricingARead-onlyIdempotentInspect
Returns raw pricing and capability data for LLM models matching the supplied filters. Filters can be combined: specific model ID, provider name, maximum input price per million tokens, required capabilities (tool_calling, json_output, streaming, vision), and minimum context window. Results are ordered by value score; default limit is 20 (max 100). Each result includes input/output prices per MTok, context length, max output tokens, capabilities, quality tier, and value score. Use for programmatic price checks, budget validation, or building custom selection logic. Does not make recommendations — use recommend_model for that.
| Name | Required | Description | Default |
|---|---|---|---|
| limit | No | Maximum number of results to return (1-100, default 20) | |
| model_id | No | Specific model ID, e.g. "anthropic/claude-sonnet-4" | |
| provider | No | Filter to models from this provider, e.g. "anthropic" | |
| capabilities | No | Required capabilities to filter by | |
| max_input_price | No | Maximum input price per million tokens in USD | |
| include_deprecated | No | Include deprecated and sunset models in results (default false) | |
| min_context_window | No | Minimum context window size in tokens |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already indicate readOnlyHint=true, idempotentHint=true, and destructiveHint=false, covering safety and idempotency. The description adds valuable context beyond this: it specifies default and maximum limits ('default limit is 20 (max 100)'), ordering ('Results are ordered by value score'), and result content details ('Each result includes input/output prices per MTok, context length, max output tokens, capabilities, quality tier, and value score'). This enhances behavioral understanding without contradicting annotations.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is front-loaded with the core purpose in the first sentence, followed by details on filters, results, usage guidelines, and exclusions. Each sentence serves a distinct purpose—explaining functionality, constraints, and usage—with no wasted words. It efficiently conveys necessary information in a structured manner.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's complexity (7 parameters, no output schema), the description provides comprehensive context. It explains the tool's purpose, filter combinations, result ordering, default/max limits, result content, and specific use cases. With annotations covering safety and idempotency, and the description filling in behavioral and usage details, it is complete enough for an agent to understand and invoke the tool correctly.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100%, meaning all parameters are well-documented in the input schema. The description adds some semantic context by mentioning that 'Filters can be combined' and listing examples like 'specific model ID, provider name, maximum input price per million tokens, required capabilities (tool_calling, json_output, streaming, vision), and minimum context window,' but this largely reiterates what the schema already provides. Given the high schema coverage, a baseline score of 3 is appropriate as the description adds minimal extra value.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool 'returns raw pricing and capability data for LLM models matching the supplied filters,' specifying both the verb ('returns') and resource ('pricing and capability data'). It explicitly distinguishes from the sibling 'recommend_model' by stating 'Does not make recommendations — use recommend_model for that,' showing clear differentiation.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides explicit guidance on when to use this tool: 'Use for programmatic price checks, budget validation, or building custom selection logic.' It also specifies when not to use it: 'Does not make recommendations — use recommend_model for that,' naming an alternative sibling tool. This covers both use cases and exclusions effectively.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
recommend_modelARead-onlyIdempotentInspect
Get the single best-fit LLM for a task, with pricing and reasoning. Provide a task type and complexity; optionally add token estimates, a per-call budget cap, and capability requirements (tool calling, JSON output, streaming, context window, provider). Returns the recommended model, its cost estimate, a reasoning summary, and ranked alternatives. Use this when you need to select a model without knowing which to pick. Do not use for raw price lookups (use get_pricing) or for comparing specific known models (use compare_models). Data is refreshed automatically from OpenRouter; check data_freshness in the response for the last-updated timestamp.
| Name | Required | Description | Default |
|---|---|---|---|
| task_type | Yes | The type of task you need a model for | |
| complexity | No | Task complexity: low, medium, or high | medium |
| requirements | No | Additional requirements for the model | |
| budget_per_call | No | Maximum spend in USD for this single call | |
| estimated_input_tokens | No | Estimated input size in tokens | |
| estimated_output_tokens | No | Estimated output size in tokens |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already declare readOnlyHint=true, idempotentHint=true, and destructiveHint=false, covering safety and idempotency. The description adds valuable context beyond annotations: it mentions the data source ('OpenRouter'), refresh behavior ('Data is refreshed automatically'), and how to check freshness ('check data_freshness in the response'). It doesn't contradict annotations.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is efficiently structured: first sentence states purpose, second lists inputs, third describes outputs, fourth gives usage context, and fifth provides data source details. Every sentence adds value with zero waste. It's appropriately sized and front-loaded with core functionality.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's complexity (6 parameters, nested objects) and lack of output schema, the description does well by explaining what the tool returns ('Returns the recommended model, its cost estimate, a reasoning summary, and ranked alternatives') and data freshness. However, it doesn't detail error cases or response format specifics, leaving some gaps for a recommendation tool.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100%, so the schema fully documents all parameters. The description mentions parameters generally ('Provide a task type and complexity; optionally add token estimates, a per-call budget cap, and capability requirements') but doesn't add specific semantic details beyond what the schema provides. Baseline 3 is appropriate when schema does the heavy lifting.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool's purpose: 'Get the single best-fit LLM for a task, with pricing and reasoning.' It specifies the verb ('Get'), resource ('best-fit LLM'), and key outputs ('pricing and reasoning'). It distinguishes from siblings by explicitly naming alternatives (get_pricing, compare_models).
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides explicit usage guidance: 'Use this when you need to select a model without knowing which to pick.' It also gives clear exclusions: 'Do not use for raw price lookups (use get_pricing) or for comparing specific known models (use compare_models).' This directly addresses when to use this tool versus alternatives.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
Claim this connector by publishing a /.well-known/glama.json file on your server's domain with the following structure:
{
"$schema": "https://glama.ai/mcp/schemas/connector.json",
"maintainers": [{ "email": "your-email@example.com" }]
}The email address must match the email associated with your Glama account. Once published, Glama will automatically detect and verify the file within a few minutes.
Control your server's listing on Glama, including description and metadata
Access analytics and receive server usage reports
Get monitoring and health status updates for your server
Feature your server to boost visibility and reach more users
For users:
Full audit trail – every tool call is logged with inputs and outputs for compliance and debugging
Granular tool control – enable or disable individual tools per connector to limit what your AI agents can do
Centralized credential management – store and rotate API keys and OAuth tokens in one place
Change alerts – get notified when a connector changes its schema, adds or removes tools, or updates tool definitions, so nothing breaks silently
For server owners:
Proven adoption – public usage metrics on your listing show real-world traction and build trust with prospective users
Tool-level analytics – see which tools are being used most, helping you prioritize development and documentation
Direct user feedback – users can report issues and suggest improvements through the listing, giving you a channel you would not have otherwise
The connector status is unhealthy when Glama is unable to successfully connect to the server. This can happen for several reasons:
The server is experiencing an outage
The URL of the server is wrong
Credentials required to access the server are missing or invalid
If you are the owner of this MCP connector and would like to make modifications to the listing, including providing test credentials for accessing the server, please contact support@glama.ai.
Discussions
No comments yet. Be the first to start the discussion!