tool_compare_runs
Compare metrics between two evaluation runs to analyze task/model performance, score differences, token usage, and duration changes.
Instructions
Compare metrics between two evaluation runs.
Shows side-by-side comparison of two evaluation runs including task/model info, sample counts, score differences, token usage differences, and duration.
Args: log_file_a: Path to first log file log_file_b: Path to second log file log_dir: Optional log directory for relative paths
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| log_file_a | Yes | ||
| log_file_b | Yes | ||
| log_dir | No |