evaluate_skill
Runs the Anthropic skill-creator eval loop to assess and optimize a skill using an evaluation dataset.
Instructions
Runs Anthropic skill-creator eval loop for a skill (requires Python, Claude CLI auth, and an eval set JSON; legacy layouts may also require ANTHROPIC_API_KEY + anthropic package).
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| skill_name | Yes | The skill directory name to evaluate | |
| eval_set_path | No | Optional path to eval set JSON. If omitted, common default locations are checked. | |
| max_iterations | No | Optional max optimization iterations | |
| num_workers | No | Optional evaluator parallel workers (defaults to 1 for stable trigger measurements) | |
| runs_per_query | No | Optional repeats per query (defaults to 1; increase for variance analysis) | |
| timeout_seconds | No | Optional timeout per query in seconds (defaults to 120) | |
| holdout | No | Optional holdout fraction for run_loop test split (defaults to 0.4, use 0 to disable holdout) | |
| trigger_threshold | No | Optional trigger-rate threshold for pass/fail decisions (defaults to 0.5) | |
| description_override | No | Optional starting description override for what-if optimization without editing SKILL.md first | |
| model | No | Optional model passed to Claude CLI (defaults to "sonnet") |