benchmark_trust_verdict
Evaluate an AI benchmark's trustworthiness: get a trust band (reliable, saturated, contaminated, deprecated) and a 0-100 trust score. Filter by benchmark or category. Free preview; paid full tier adds detail.
Instructions
TensorFeed's signed ruling on whether an AI benchmark is still a trustworthy capability signal or saturated, contaminated, or near ceiling so a high score should be down-weighted: a trust band (reliable, use_with_caution, saturated, contaminated, deprecated) and a 0-100 trust score per benchmark. Pass benchmark to narrow to one, or category to filter, or neither for the registry. tier='preview' (default) is free (10 calls per day per IP), top verdict and bands only. tier='full' costs 1 credit ($0.02), adds the per-signal detail (ceiling proximity, frontier compression, contamination), a down-weight recommendation with an alternative benchmark, and an AFTA-signed receipt, and needs a TENSORFEED_TOKEN. Get credits at tensorfeed.ai/developers/agent-payments.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| tier | No | 'preview' (default, free) or 'full' (1 credit; adds per-signal detail, recommendation, signed receipt). | |
| benchmark | No | Benchmark registry id or name to narrow to one (e.g. "mmlu", "swe-bench"). Optional. | |
| category | No | Category to filter the benchmarks (e.g. "coding", "reasoning"). Optional. |