Create Model Experiment
metrx_create_model_experimentInitiate an A/B test comparing two LLM models for an agent, routing traffic to the treatment model while tracking cost, latency, error rate, and quality until statistical significance or max duration.
Instructions
Start an A/B test comparing two LLM models for a specific agent. Routes a percentage of traffic to the treatment model and tracks cost, latency, error rate, and quality metrics. The experiment runs until statistical significance is reached or the max duration expires. Do NOT use for one-off model comparisons — use compare_models for static pricing data.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| agent_id | Yes | Agent to run the experiment on | |
| name | Yes | Human-readable experiment name | |
| treatment_model | Yes | The candidate model to test (e.g., "gpt-4o-mini", "claude-haiku-4-20250414") | |
| traffic_pct | No | Percentage of traffic to route to the treatment model (default: 10%) | |
| primary_metric | No | The primary metric to optimize for (default: cost_per_call) | cost_per_call |
| max_duration_days | No | Maximum experiment duration in days (default: 14) | |
| auto_promote | No | Automatically apply the winning model when the experiment concludes |