Create Model Experiment
metrx_create_model_experimentRun A/B experiments comparing LLM models for an agent. Route traffic to a candidate model, track cost, latency, error rate, and quality until statistical significance is reached.
Instructions
Start an A/B test comparing two LLM models for a specific agent. Routes a percentage of traffic to the treatment model and tracks cost, latency, error rate, and quality metrics. The experiment runs until statistical significance is reached or the max duration expires. Do NOT use for one-off model comparisons — use compare_models for static pricing data.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| agent_id | Yes | Agent to run the experiment on | |
| name | Yes | Human-readable experiment name | |
| treatment_model | Yes | The candidate model to test (e.g., "gpt-4o-mini", "claude-haiku-4-20250414") | |
| traffic_pct | No | Percentage of traffic to route to the treatment model (default: 10%) | |
| primary_metric | No | The primary metric to optimize for (default: cost_per_call) | cost_per_call |
| max_duration_days | No | Maximum experiment duration in days (default: 14) | |
| auto_promote | No | Automatically apply the winning model when the experiment completes |