run_snapshot
Run tests and save passing results as the new golden baseline to establish or update expected behavior after intentional changes.
Instructions
Run tests and save passing results as the new golden baseline. Use this to establish or update the expected behavior after an intentional change. Future run_check calls will compare against this snapshot. Call this: (1) after creating a new test with create_test, (2) after confirming a behavioral change is intentional, (3) before making large refactors so you have a clean rollback point. Only passing tests are saved — failing tests are skipped with a warning. IMPORTANT: Automatically detect test_path by looking for a 'tests/evalview/' directory in the current project. If it exists, pass it as test_path.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| test | No | Snapshot only this specific test by name (optional, snapshots all by default) | |
| judge | No | Judge model for scoring (e.g. 'gpt-5', 'sonnet'). | |
| notes | No | Human-readable note about why this snapshot was taken | |
| reset | No | Delete all existing baselines before capturing new ones. Default: false. | |
| preview | No | Show what would change without saving (dry-run mode). Default: false. | |
| timeout | No | Timeout per test in seconds (default: 30). | |
| variant | No | Save as named variant for non-deterministic agents (max 5 per test). E.g. 'v2', 'async-path'. | |
| test_path | No | Path to the test directory. Auto-detect: use 'tests/evalview/' if it exists, otherwise 'tests'. |