tool_get_aggregate_stats
Analyze evaluation logs to calculate aggregate statistics including success rates, token usage, and duration averages across tasks and models.
Instructions
Get aggregate statistics across multiple evaluation runs.
Provides summary statistics grouped by task and model, including success rates, sample counts, token usage totals, and duration averages.
Args: log_dir: Directory containing log files task: Filter by task name (supports wildcards) model: Filter by model name (supports wildcards) date_from: Filter logs from this date (ISO format) date_to: Filter logs until this date (ISO format)
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| log_dir | No | ||
| task | No | ||
| model | No | ||
| date_from | No | ||
| date_to | No |