get_benchmark_stats
Retrieve statistical metrics for research benchmarks to contextualize paper performance. Provides distribution data including min, max, median, mean, and standard deviation for dataset-metric combinations.
Instructions
Get score distribution statistics for a dataset+metric across all papers. Returns min, max, median, mean, p25, p75, stddev, and count. Use this to contextualize a paper's claims — e.g., 'For MMLU accuracy, the median is 72.5% across 45 papers, range 33%-95%.' No judgment or outlier flags — just raw statistics.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| dataset | Yes | Dataset/benchmark name e.g. 'ImageNet', 'MMLU', 'SWE-bench Verified' | |
| metric | Yes | Metric name e.g. 'accuracy', 'F1', 'pass@1' |