test_model
Compare AI model performance by testing 1-5 models simultaneously with identical prompts. Get output text, latency, token usage, and cost estimates for informed model selection.
Instructions
Make live API calls to 1-5 models via OpenRouter. Returns output text, latency (ms), token usage, and cost estimates.
Requires OPENROUTER_API_KEY in MCP client configuration. Costs are billed to your OpenRouter account.
Parameters:
model_ids: 1-5 model IDs to test (all receive identical prompts)
test_type: 'quick' (math), 'code', 'reasoning', 'instruction', 'tool_calling'
prompt: Custom prompt (overrides test_type)
max_tokens: Response length limit (default 1000)
Use find_models or get_model first to identify model IDs.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| model_ids | Yes | Array of 1-5 model IDs to test simultaneously in format 'provider/model-name' (e.g., ['openai/gpt-5.1-codex-max', 'anthropic/claude-haiku-4.5']). All models receive identical prompts. Use find_models or get_model first to identify model IDs. | |
| test_type | No | Preset test scenario. Ignored if custom 'prompt' provided. Options: 'quick' (simple math, fastest), 'code' (function generation), 'reasoning' (logic puzzle), 'instruction' (following complex directions), 'tool_calling' (function calling capability). Defaults to 'quick'. | quick |
| prompt | No | Custom user message to send to all models. Overrides 'test_type' when provided. Use for debugging specific issues, comparing exact outputs, or testing domain-specific prompts. Example: 'Write a Python function to validate email addresses'. | |
| max_tokens | No | Maximum tokens for model response (1-8192). Higher values allow longer outputs but increase cost and latency. Defaults to 1000. Use 100-500 for quick tests, 1000-2000 for code/reasoning, 4000+ for long-form content. | |
| temperature | No | Sampling temperature (0-2). Lower values are more deterministic, higher values more creative. Defaults to 0.7. | |
| system_prompt | No | System message to set model behavior. Example: 'You are a helpful coding assistant.' |