get_evaluation
Retrieve per-question per-model scores, responses, and judge reasoning from a specific evaluation run.
Instructions
Retrieve the full details of a specific evaluation run: per-question per-model scores, responses, and judge reasoning.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| results_path | Yes | Path to a specific evaluation result file. |