Skip to main content
Glama

run_snapshot

Save passing test results as golden baselines to establish expected behavior after intentional changes. Updates the reference point for future regression checks.

Instructions

Run tests and save passing results as the new golden baseline. Use this to establish or update the expected behavior after an intentional change. Future run_check calls will compare against this snapshot. Call this: (1) after creating a new test with create_test, (2) after confirming a behavioral change is intentional, (3) before making large refactors so you have a clean rollback point. Only passing tests are saved — failing tests are skipped with a warning. IMPORTANT: Automatically detect test_path by looking for a 'tests/evalview/' directory in the current project. If it exists, pass it as test_path.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
testNoSnapshot only this specific test by name (optional, snapshots all by default)
notesNoHuman-readable note about why this snapshot was taken
test_pathNoPath to the test directory. Auto-detect: use 'tests/evalview/' if it exists, otherwise 'tests'.

Implementation Reference

  • The run_snapshot MCP tool handler implementation in evalview/mcp_server.py, which invokes the evalview CLI tool.
    elif name == "run_snapshot":
        test_path = os.path.normpath(args.get("test_path", self.test_path))
        cmd = ["evalview", "snapshot", test_path]
        if args.get("test"):
            cmd += ["--test", args["test"]]
        if args.get("notes"):
            cmd += ["--notes", args["notes"]]
  • The schema definition for the run_snapshot MCP tool.
    {
        "name": "run_snapshot",
        "description": (
            "Run tests and save passing results as the new golden baseline. "
            "Use this to establish or update the expected behavior after an intentional change. "
            "Future `run_check` calls will compare against this snapshot. "
            "Call this: (1) after creating a new test with create_test, "
            "(2) after confirming a behavioral change is intentional, "
            "(3) before making large refactors so you have a clean rollback point. "
            "Only passing tests are saved — failing tests are skipped with a warning. "
            "IMPORTANT: Automatically detect test_path by looking for a 'tests/evalview/' "
            "directory in the current project. If it exists, pass it as test_path."
        ),
        "inputSchema": {
            "type": "object",
            "properties": {
                "test": {
                    "type": "string",
                    "description": "Snapshot only this specific test by name (optional, snapshots all by default)",
                },
                "notes": {
                    "type": "string",
                    "description": "Human-readable note about why this snapshot was taken",
                },
                "test_path": {
                    "type": "string",
                    "description": (
                        "Path to the test directory. "
                        "Auto-detect: use 'tests/evalview/' if it exists, otherwise 'tests'."
                    ),
                },
            },
        },
    },
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It effectively describes key behaviors: only passing tests are saved (with failing tests skipped with a warning), automatic path detection logic, and the tool's role in establishing baselines for future comparisons. However, it doesn't cover potential side effects like performance impact or error handling details.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured and front-loaded with the core purpose. Most sentences earn their place by providing usage guidelines and behavioral details. However, the final sentence about path detection could be integrated more smoothly, and there's minor repetition in explaining the 'test_path' auto-detection.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (mutation operation with no annotations or output schema), the description does an excellent job covering purpose, usage scenarios, behavioral traits, and parameter context. It explains the tool's role in the workflow and relationship with other tools. The main gap is lack of output format information, but this is partially compensated by explaining what the tool establishes for future use.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the baseline is 3. The description adds significant value by explaining the automatic detection logic for 'test_path' ('Automatically detect test_path by looking for a 'tests/evalview/' directory'), which provides practical usage context beyond the schema's technical description. It also clarifies the default behavior when 'test' parameter is omitted ('snapshots all by default').

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose with specific verbs ('run tests and save passing results as the new golden baseline') and distinguishes it from sibling tools like 'run_check' by explaining the relationship between them. It explicitly mentions what the tool does and how it differs from alternatives.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit guidance on when to use this tool, listing three specific scenarios (after creating a new test, after confirming intentional changes, before large refactors). It also clarifies the relationship with 'run_check' and mentions sibling tools like 'create_test', offering clear alternatives and context.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/hidai25/eval-view'

If you have feedback or need assistance with the MCP directory API, please join our Discord server