benchmark_propose
Build benchmark scorecards from real prior uses, requiring task-value and resource dimensions and concrete cases to ensure measurable proof.
Instructions
Propose one or more benchmark scorecards built from real prior uses/failures. Each needs ≥1 task-value dimension, ≥1 resource/cost dimension, and ≥1 concrete case, or it is rejected as a hand-waved benchmark.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| runId | Yes | ||
| benchmarks | Yes |