Skip to main content
Glama

create_test

Create YAML test cases for AI agents to capture expected behavior and establish baselines for regression testing.

Instructions

Create a new EvalView test case YAML file for an agent. Call this when the user asks to add a test, or when you want to capture expected agent behavior. After creating a test, call run_snapshot to establish the baseline. No YAML knowledge required — just describe the test. IMPORTANT: Automatically detect test_path by looking for a 'tests/evalview/' directory in the current project. If found, use it. Otherwise use 'tests'.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
nameYesTest name (e.g. 'calculator-division', 'weather-lookup')
queryYesThe input query to send to the agent
descriptionNoHuman-readable description of what this test covers
expected_toolsNoTool names the agent should call (e.g. ['calculator', 'search'])
forbidden_toolsNoTool names the agent must NEVER call. Any violation is an immediate hard-fail (score=0, passed=false) regardless of output quality. Use this for safety contracts — e.g. a read-only agent that must never call edit_file, bash, or write_file. Matching is case-insensitive: 'EditFile' catches 'edit_file'.
expected_output_containsNoStrings that must appear in the agent's output
min_scoreNoMinimum passing score 0-100 (default: 70)
test_pathNoDirectory to save the test file. Auto-detect: use 'tests/evalview/' if it exists in the project, otherwise 'tests'.

Implementation Reference

  • The `_create_test` method implements the `create_test` tool, creating a YAML test file based on the provided arguments.
    def _create_test(self, args: Dict[str, Any]) -> str:
        test_name = args.get("name", "").strip()
        query = args.get("query", "").strip()
        if not test_name or not query:
            return "Error: 'name' and 'query' are required."
    
        test_path = args.get("test_path", self.test_path)
        slug = test_name.lower().replace(" ", "-").replace("_", "-")
        filename = os.path.join(test_path, f"{slug}.yaml")
    
        if os.path.exists(filename):
            return f"Error: test already exists at {filename}. Delete it first or choose a different name."
    
        os.makedirs(test_path, exist_ok=True)
    
        lines = [f'name: "{test_name}"']
    
        description = args.get("description", "")
        if description:
            lines.append(f'description: "{description}"')
    
        lines += ["", "input:", f'  query: "{query}"', "", "expected:"]
    
        expected_tools = args.get("expected_tools", [])
        if expected_tools:
            lines.append("  tools:")
            for t in expected_tools:
                lines.append(f"    - {t}")
    
        forbidden_tools = args.get("forbidden_tools", [])
        if forbidden_tools:
            lines.append("  # Tools that must NEVER be called — hard-fail if any are invoked.")
            lines.append("  forbidden_tools:")
            for t in forbidden_tools:
                lines.append(f"    - {t}")
    
        expected_output = args.get("expected_output_contains", [])
        if expected_output:
            lines.append("  output:")
            lines.append("    contains:")
            for s in expected_output:
                lines.append(f'      - "{s}"')
    
        min_score = args.get("min_score", 70)
        lines += ["", "thresholds:", f"  min_score: {int(min_score)}"]
    
        with open(filename, "w") as f:
            f.write("\n".join(lines) + "\n")
    
        summary_parts = [f"query: {query}"]
        if expected_tools:
            summary_parts.append(f"tools: {', '.join(expected_tools)}")
        if forbidden_tools:
            summary_parts.append(f"forbidden_tools (hard-fail): {', '.join(forbidden_tools)}")
        if expected_output:
            summary_parts.append(f"output contains: {', '.join(expected_output)}")
    
        return (
            f"Created {filename}\n"
            + "\n".join(f"  {p}" for p in summary_parts)
            + "\n\nRun run_snapshot to capture the baseline for this test."
        )
  • The tool is registered and called within the `call_tool` handler in `evalview/mcp_server.py`.
    if name == "create_test":
        return self._create_test(args)

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/hidai25/eval-view'

If you have feedback or need assistance with the MCP directory API, please join our Discord server