benchmark_series
Track daily benchmark scores for a model and benchmark over a window. Supports swe_bench, mmlu_pro, gpqa_diamond, math, human_eval. Free for up to 7 days; extended periods cost 1 credit.
Instructions
Daily benchmark scores for one model+benchmark over a window. Benchmark keys: swe_bench, mmlu_pro, gpqa_diamond, math, human_eval. days 1 to 7 is free; days 8 to 90 costs 1 credit ($0.02) and needs a TENSORFEED_TOKEN, tracking score evolution over the longer window. Get credits at tensorfeed.ai/developers/agent-payments.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| model | Yes | Model id or display name. | |
| benchmark | Yes | Benchmark key (e.g. swe_bench, mmlu_pro, gpqa_diamond, math, human_eval). | |
| days | No | Window length (default 7). 1 to 7 free; 8 to 90 costs 1 credit. |