Skip to main content
Glama

generate_skill_tests

Generate YAML test suites from SKILL.md files to validate AI agent skills across multiple test categories, then execute them for comprehensive evaluation.

Instructions

Auto-generate test cases from a SKILL.md file. Call this when the user asks to create tests for a skill — it reads the skill definition and generates a ready-to-run YAML test suite covering explicit, implicit, contextual, and negative test categories. After generating, call run_skill_test to execute them.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
skill_pathYesPath to the SKILL.md file to generate tests from
output_pathNoWhere to save the generated test YAML (default: same directory as SKILL.md)
countNoNumber of test cases to generate (default: 10)

Implementation Reference

  • Implementation of the 'generate_skill_tests' tool which constructs the command line call to 'evalview skill generate-tests'.
    elif name == "generate_skill_tests":
        skill_path = os.path.normpath(args.get("skill_path", ""))
        if not skill_path:
            return "Error: 'skill_path' is required."
        cmd = ["evalview", "skill", "generate-tests", skill_path, "--auto"]
        if args.get("output_path"):
            cmd += ["-o", os.path.normpath(args["output_path"])]
        if args.get("count"):
            cmd += ["-c", str(int(args["count"]))]
  • Schema definition for 'generate_skill_tests' tool including input arguments.
    {
        "name": "generate_skill_tests",
        "description": (
            "Auto-generate test cases from a SKILL.md file. "
            "Call this when the user asks to create tests for a skill — it reads the skill "
            "definition and generates a ready-to-run YAML test suite covering explicit, "
            "implicit, contextual, and negative test categories. "
            "After generating, call run_skill_test to execute them."
        ),
        "inputSchema": {
            "type": "object",
            "required": ["skill_path"],
            "properties": {
                "skill_path": {
                    "type": "string",
                    "description": "Path to the SKILL.md file to generate tests from",
                },
                "output_path": {
                    "type": "string",
                    "description": "Where to save the generated test YAML (default: same directory as SKILL.md)",
                },
                "count": {
                    "type": "number",
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden. It discloses that the tool reads a file and generates output, but lacks details on permissions, error handling, rate limits, or what happens if files exist. It adds some context (e.g., 'ready-to-run YAML test suite'), but behavioral traits like side effects or performance are not covered.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is front-loaded with the core purpose, uses two efficient sentences with zero waste, and each part (purpose, usage, next step) earns its place without redundancy or fluff.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no annotations and no output schema, the description is mostly complete for a tool with 3 parameters and 100% schema coverage. It covers purpose, usage, and next steps, but could improve by mentioning output format or error cases. It's sufficient but not exhaustive.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already documents all parameters. The description does not add meaning beyond the schema (e.g., it doesn't explain format details for paths or constraints for count). Baseline 3 is appropriate as the schema handles parameter documentation adequately.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the specific action ('Auto-generate test cases') and resource ('from a SKILL.md file'), distinguishing it from siblings like 'run_skill_test' (execution) and 'create_test' (manual creation). It explicitly mentions generating a 'YAML test suite covering explicit, implicit, contextual, and negative test categories,' providing detailed scope.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

It explicitly states when to use this tool ('when the user asks to create tests for a skill') and provides a clear alternative/next step ('After generating, call run_skill_test to execute them'), distinguishing it from siblings like 'run_skill_test' and 'create_test' without being misleading.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/hidai25/eval-view'

If you have feedback or need assistance with the MCP directory API, please join our Discord server