run_check
Detect regressions in AI agent tests by comparing results against golden baselines. Identify PASSED, OUTPUT_CHANGED, TOOLS_CHANGED, or REGRESSION status to validate code changes.
Instructions
Check for regressions against the golden baseline. Returns a diff summary for each test: PASSED, OUTPUT_CHANGED, TOOLS_CHANGED, or REGRESSION. REGRESSION means the score dropped significantly — treat this as a blocking failure. TOOLS_CHANGED / OUTPUT_CHANGED are warnings: the agent's behavior shifted but may be intentional. Use this after any code change (prompt, model, tools) to confirm nothing broke. If you see a regression, show the diff to the user and offer to fix it before moving on. IMPORTANT: Automatically detect test_path by looking for a 'tests/evalview/' directory in the current project. If it exists, pass it as test_path. If the project has a custom test location, use that instead.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| test | No | Check only this specific test by name (optional, checks all by default) | |
| test_path | No | Path to the test directory. Auto-detect: use 'tests/evalview/' if it exists, otherwise 'tests'. |
Implementation Reference
- evalview/mcp_server.py:415-420 (handler)The implementation of the 'run_check' tool handler in the MCP server, which executes `evalview check` as a subprocess.
if name == "run_check": test_path = os.path.normpath(args.get("test_path", self.test_path)) cmd = ["evalview", "check", test_path, "--json"] if args.get("test"): cmd += ["--test", args["test"]] - evalview/mcp_server.py:83-112 (registration)Registration and schema definition for the 'run_check' tool.
{ "name": "run_check", "description": ( "Check for regressions against the golden baseline. " "Returns a diff summary for each test: PASSED, OUTPUT_CHANGED, TOOLS_CHANGED, or REGRESSION. " "REGRESSION means the score dropped significantly — treat this as a blocking failure. " "TOOLS_CHANGED / OUTPUT_CHANGED are warnings: the agent's behavior shifted but may be intentional. " "Use this after any code change (prompt, model, tools) to confirm nothing broke. " "If you see a regression, show the diff to the user and offer to fix it before moving on. " "IMPORTANT: Automatically detect test_path by looking for a 'tests/evalview/' " "directory in the current project. If it exists, pass it as test_path. " "If the project has a custom test location, use that instead." ), "inputSchema": { "type": "object", "properties": { "test": { "type": "string", "description": "Check only this specific test by name (optional, checks all by default)", }, "test_path": { "type": "string", "description": ( "Path to the test directory. " "Auto-detect: use 'tests/evalview/' if it exists, otherwise 'tests'." ), }, }, }, },