create_run
Launch an evaluation run by specifying an agent, persona, and test set. Optionally add tags to filter and track runs.
Instructions
Launch evaluation: agent + persona + test_set. Optionally add tags for filtering. Poll get_run until status=COMPLETED to see metrics.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| agent_id | Yes | The unique ID of the agent to evaluate. Get this from list_agents. | |
| persona_id | Yes | The unique ID of the persona to use. Get this from list_personas. | |
| test_set_id | Yes | The unique ID of the test set to run against. Get this from list_test_sets. | |
| metric_ids | No | Optional list of metric IDs to evaluate. Uses agent defaults if omitted. | |
| options | No | Run configuration options | |
| tags | No | Tags for categorizing and filtering the run (max 20 tags, 200 chars each). Filter later with list_runs filter='tag="regression"'. | |
| metadata | No | Custom metadata for tracking purposes |