Skip to main content
Glama

Boruna Manage Tool

boruna_manage

Execute and manage Boruna evaluation framework operations: run benchmarks, validate results, retrieve evidence details, list capabilities, and handle skills for AI agent assessment.

Instructions

Manage Boruna evaluation framework. Actions: run (agent_id, benchmark), validate (run_id — validate results), evidence (run_id — get evidence details), capability_list (list evaluation capabilities), skill_manage (manage Boruna skills).

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
actionYesAction to perform: run, validate, evidence, capability_list, skill_manage
modeYesExecution mode: inline (run script directly) or skill (run a saved boruna_script skill)
scriptNo(inline mode) The .ax script source code to execute
policyNo(inline mode) Capability policy: allow-all or deny-all (default: deny-all)
boruna_tool_idNo(inline mode) UUID of the mcp_stdio Tool pointing to the Boruna binary. If omitted, auto-detects.
inputNoOptional input data passed to the script as JSON
skill_idNo(skill mode) UUID of the boruna_script Skill to execute
execution_idYesUUID of the SkillExecution record from a boruna_script skill run
nameNo(create) Skill name
descriptionNo(create) Skill description
limitNo(list | executions) Max results (default 20)
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations provided, so description carries full burden. Fails to disclose mutation semantics (whether actions create resources, are destructive, or have side effects), security implications of script execution, rate limits, or return value structure. Only labels actions without behavioral context.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single sentence with dense parenthetical action list is compact but hard to scan. Front-loads the framework name appropriately, though the parenthetical parameter lists (some incorrect) create cognitive overhead.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Complex 11-parameter tool with conditional logic (inline vs skill modes) and no output schema requires richer context. Description fails to explain what 'Boruna' evaluates, how the framework operates, or why execution_id is globally required. Insufficient for a multi-modal management tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema has 100% coverage (baseline 3), but description contains serious errors: references 'agent_id' and 'benchmark' parameters for the run action that do not exist in the schema. While it clarifies action purposes (validate results, get evidence), the non-existent parameter references confuse the actual interface.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose3/5

Does the description clearly state what the tool does and how it differs from similar tools?

States 'Manage Boruna evaluation framework' with specific action list (run, validate, evidence, etc.), but uses vague verb 'Manage' that overlaps with sibling tools (*_manage pattern). Confusingly references 'skill_manage' as an action when 'skill_manage' is also a sibling tool name, creating ambiguity about tool boundaries.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Lists available actions but provides no guidance on when to use boruna_manage versus the sibling skill_manage tool, or when to choose inline versus skill execution modes. No mention of prerequisites (e.g., that execution_id appears to come from prior runs).

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/escapeboy/agent-fleet-o'

If you have feedback or need assistance with the MCP directory API, please join our Discord server