Skip to main content
Glama

run_check

Detect regressions in AI agent tests by comparing results against golden baselines. Identify PASSED, OUTPUT_CHANGED, TOOLS_CHANGED, or REGRESSION status to validate code changes.

Instructions

Check for regressions against the golden baseline. Returns a diff summary for each test: PASSED, OUTPUT_CHANGED, TOOLS_CHANGED, or REGRESSION. REGRESSION means the score dropped significantly — treat this as a blocking failure. TOOLS_CHANGED / OUTPUT_CHANGED are warnings: the agent's behavior shifted but may be intentional. Use this after any code change (prompt, model, tools) to confirm nothing broke. If you see a regression, show the diff to the user and offer to fix it before moving on. IMPORTANT: Automatically detect test_path by looking for a 'tests/evalview/' directory in the current project. If it exists, pass it as test_path. If the project has a custom test location, use that instead.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
testNoCheck only this specific test by name (optional, checks all by default)
test_pathNoPath to the test directory. Auto-detect: use 'tests/evalview/' if it exists, otherwise 'tests'.

Implementation Reference

  • The implementation of the 'run_check' tool handler in the MCP server, which executes `evalview check` as a subprocess.
    if name == "run_check":
        test_path = os.path.normpath(args.get("test_path", self.test_path))
        cmd = ["evalview", "check", test_path, "--json"]
        if args.get("test"):
            cmd += ["--test", args["test"]]
  • Registration and schema definition for the 'run_check' tool.
    {
        "name": "run_check",
        "description": (
            "Check for regressions against the golden baseline. "
            "Returns a diff summary for each test: PASSED, OUTPUT_CHANGED, TOOLS_CHANGED, or REGRESSION. "
            "REGRESSION means the score dropped significantly — treat this as a blocking failure. "
            "TOOLS_CHANGED / OUTPUT_CHANGED are warnings: the agent's behavior shifted but may be intentional. "
            "Use this after any code change (prompt, model, tools) to confirm nothing broke. "
            "If you see a regression, show the diff to the user and offer to fix it before moving on. "
            "IMPORTANT: Automatically detect test_path by looking for a 'tests/evalview/' "
            "directory in the current project. If it exists, pass it as test_path. "
            "If the project has a custom test location, use that instead."
        ),
        "inputSchema": {
            "type": "object",
            "properties": {
                "test": {
                    "type": "string",
                    "description": "Check only this specific test by name (optional, checks all by default)",
                },
                "test_path": {
                    "type": "string",
                    "description": (
                        "Path to the test directory. "
                        "Auto-detect: use 'tests/evalview/' if it exists, otherwise 'tests'."
                    ),
                },
            },
        },
    },

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/hidai25/eval-view'

If you have feedback or need assistance with the MCP directory API, please join our Discord server