boruna_manage
Execute and manage Boruna evaluation framework operations: run benchmarks, validate results, retrieve evidence details, list capabilities, and handle skills for AI agent assessment.
Instructions
Manage Boruna evaluation framework. Actions: run (agent_id, benchmark), validate (run_id — validate results), evidence (run_id — get evidence details), capability_list (list evaluation capabilities), skill_manage (manage Boruna skills).
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| action | Yes | Action to perform: run, validate, evidence, capability_list, skill_manage | |
| mode | Yes | Execution mode: inline (run script directly) or skill (run a saved boruna_script skill) | |
| script | No | (inline mode) The .ax script source code to execute | |
| policy | No | (inline mode) Capability policy: allow-all or deny-all (default: deny-all) | |
| boruna_tool_id | No | (inline mode) UUID of the mcp_stdio Tool pointing to the Boruna binary. If omitted, auto-detects. | |
| input | No | Optional input data passed to the script as JSON | |
| skill_id | No | (skill mode) UUID of the boruna_script Skill to execute | |
| execution_id | Yes | UUID of the SkillExecution record from a boruna_script skill run | |
| name | No | (create) Skill name | |
| description | No | (create) Skill description | |
| limit | No | (list | executions) Max results (default 20) |