create_test
Define test cases for AI agents by specifying queries, expected tools, forbidden tools, and output constraints. Automatically saves to the project's tests directory.
Instructions
Create a new EvalView test case YAML file for an agent. Call this when the user asks to add a test, or when you want to capture expected agent behavior. After creating a test, call run_snapshot to establish the baseline. No YAML knowledge required — just describe the test. IMPORTANT: Automatically detect test_path by looking for a 'tests/evalview/' directory in the current project. If found, use it. Otherwise use 'tests'.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| name | Yes | Test name (e.g. 'calculator-division', 'weather-lookup') | |
| query | Yes | The input query to send to the agent | |
| description | No | Human-readable description of what this test covers | |
| expected_tools | No | Tool names the agent should call (e.g. ['calculator', 'search']) | |
| forbidden_tools | No | Tool names the agent must NEVER call. Any violation is an immediate hard-fail (score=0, passed=false) regardless of output quality. Use this for safety contracts — e.g. a read-only agent that must never call edit_file, bash, or write_file. Matching is case-insensitive: 'EditFile' catches 'edit_file'. | |
| expected_output_contains | No | Strings that must appear in the agent's output | |
| min_score | No | Minimum passing score 0-100 (default: 70) | |
| test_path | No | Directory to save the test file. Auto-detect: use 'tests/evalview/' if it exists in the project, otherwise 'tests'. |