run_snapshot
Capture passing test results as a golden baseline to define expected behavior. Future runs compare against this snapshot for regression detection.
Instructions
Run tests and save passing results as the new golden baseline. Use this to establish or update the expected behavior after an intentional change. Future run_check calls will compare against this snapshot. Call this: (1) after creating a new test with create_test, (2) after confirming a behavioral change is intentional, (3) before making large refactors so you have a clean rollback point. Only passing tests are saved — failing tests are skipped with a warning. IMPORTANT: Automatically detect test_path by looking for a 'tests/evalview/' directory in the current project. If it exists, pass it as test_path.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| test | No | Snapshot only this specific test by name (optional, snapshots all by default) | |
| notes | No | Human-readable note about why this snapshot was taken | |
| test_path | No | Path to the test directory. Auto-detect: use 'tests/evalview/' if it exists, otherwise 'tests'. | |
| variant | No | Save as named variant for non-deterministic agents (max 5 per test). E.g. 'v2', 'async-path'. | |
| preview | No | Show what would change without saving (dry-run mode). Default: false. | |
| reset | No | Delete all existing baselines before capturing new ones. Default: false. | |
| judge | No | Judge model for scoring (e.g. 'gpt-5', 'sonnet'). | |
| timeout | No | Timeout per test in seconds (default: 30). |