get_benchmark_timeline
Retrieve chronological raw benchmark scores for a specific dataset and metric to track performance evolution over time in AI/ML research.
Instructions
Get raw benchmark score data points over time for a dataset+metric. Returns individual (paper, date, score, value_string) entries ordered chronologically. No trend lines or interpretation — raw scatter data. Use search_benchmarks first to find the exact dataset and metric names.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| dataset | Yes | Dataset/benchmark name e.g. 'ImageNet', 'MMLU', 'SWE-bench Verified' | |
| metric | Yes | Metric name e.g. 'accuracy', 'F1', 'pass@1' |