Skip to main content
Glama

run_tests

Run the active test suite and generate a structured report, optionally filtering by test name, to verify new tests or confirm bug fixes.

Instructions

Execute the test suite under the active QA_RUNNER and produce a structured report. The single most-called tool — invoke whenever a user says 「跑/run/test/check/驗證/執行」, after generate_test (verify new test), or after a fix (confirm bug gone).

Behavior:

  • Invokes the runner's native CLI under QA_PROJECT_ROOT — pytest with --screenshot=on / --tracing=on / --video=retain-on-failure, or npx jest --json, npx cypress run --reporter json, go test -json, maestro test --format junit

  • Optional filter narrows the scope: pytest -k expr, jest -t pattern, cypress --spec glob, go -run regex, maestro flow-name substring

  • Writes report.json (pytest-json-report shape, runner-agnostic) + JUnit XML

  • Snapshots the run into history/ and auto-triggers optimizer.write_plan() → optimization-plan.md is refreshed

  • Maestro: auto-retries flows that failed on first attempt (MAESTRO_RETRY=true), surfaces flaky_in_run count Returns: {exit_code, raw_exit_code, stdout_tail, stderr_tail, retry_enabled, flaky_in_run, ...}

When to use:

  • After writing a new test → verify it actually passes

  • Smoke before a release

  • Whenever the user prompt contains a run/test verb

When NOT to use:

  • Inspecting last results without re-running → use get_test_report (cheaper)

  • Re-running only failed cases → use run_failed (way faster)

  • Enumerating which tests exist → use list_tests

Edge cases:

  • No tests match filter → exit_code != 0 with 「no tests ran」 in stderr_tail

  • QA_TIMEOUT_SECONDS exceeded → exit_code 124 + [TIMEOUT…] tag in stderr_tail

  • filter starting with - or containing .. → blocked by security guardrail, returns {error: …}

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
filterNo選填,測試名稱關鍵字。pytest 走 -k 表達式(支援 and/or/not)、Jest 走 -t、Cypress 走 --spec '**/*<filter>*'、Go 走 -run regex、Maestro 在 flow 檔名作子字串比對。
headedNo選填,僅對 pytest-playwright 有效。True 時瀏覽器有 UI 模式跑(適合 debug、看 flake 視覺現象);預設 headless 跑、CI / 大量套件用這個。
browserNo選填,僅對 pytest-playwright 有效,指定 Playwright 啟用的 browser engine。需事先 `playwright install <browser>` 過。chromium

Implementation Reference

  • Tool-level entry point (runner.py): dispatches run_tests by getting the active runner and calling its run_tests method, with security filter validation first.
    def run_tests(filter=None, headed=False, browser="chromium") -> dict:
        ok, err = validate_filter(filter)
        if not ok:
            return {"error": err}
        return get_runner().run_tests(filter=filter, headed=headed, browser=browser)
  • MCP tool registration in server.py: defines the 'run_tests' Tool with description, schema (filter, headed, browser), and detailed usage docs.
        name="run_tests",
        description=(
            "Execute the test suite under the active QA_RUNNER and produce a structured "
            "report. The single most-called tool — invoke whenever a user says "
            "「跑/run/test/check/驗證/執行」, after generate_test (verify new test), "
            "or after a fix (confirm bug gone).\n\n"
            "Behavior:\n"
            "- Invokes the runner's native CLI under QA_PROJECT_ROOT — pytest with "
            "  --screenshot=on / --tracing=on / --video=retain-on-failure, or "
            "  `npx jest --json`, `npx cypress run --reporter json`, `go test -json`, "
            "  `maestro test --format junit`\n"
            "- Optional `filter` narrows the scope: pytest -k expr, jest -t pattern, "
            "  cypress --spec glob, go -run regex, maestro flow-name substring\n"
            "- Writes report.json (pytest-json-report shape, runner-agnostic) + JUnit XML\n"
            "- Snapshots the run into history/ and auto-triggers optimizer.write_plan() "
            "  → optimization-plan.md is refreshed\n"
            "- Maestro: auto-retries flows that failed on first attempt (MAESTRO_RETRY=true), "
            "  surfaces flaky_in_run count\n"
            "Returns: {exit_code, raw_exit_code, stdout_tail, stderr_tail, retry_enabled, "
            "flaky_in_run, ...}\n\n"
            "When to use:\n"
            "- After writing a new test → verify it actually passes\n"
            "- Smoke before a release\n"
            "- Whenever the user prompt contains a run/test verb\n\n"
            "When NOT to use:\n"
            "- Inspecting last results without re-running → use get_test_report (cheaper)\n"
            "- Re-running only failed cases → use run_failed (way faster)\n"
            "- Enumerating which tests exist → use list_tests\n\n"
            "Edge cases:\n"
            "- No tests match `filter` → exit_code != 0 with 「no tests ran」 in stderr_tail\n"
            "- QA_TIMEOUT_SECONDS exceeded → exit_code 124 + `[TIMEOUT…]` tag in stderr_tail\n"
            "- `filter` starting with `-` or containing `..` → blocked by security "
            "  guardrail, returns {error: …}"
        ),
        inputSchema={
            "type": "object",
            "properties": {
                "filter": {
                    "type": "string",
                    "description": (
                        "選填,測試名稱關鍵字。pytest 走 -k 表達式(支援 and/or/not)、"
                        "Jest 走 -t、Cypress 走 --spec '**/*<filter>*'、Go 走 -run "
                        "regex、Maestro 在 flow 檔名作子字串比對。"
                    ),
                },
                "headed": {
                    "type": "boolean",
                    "default": False,
                    "description": (
                        "選填,僅對 pytest-playwright 有效。True 時瀏覽器有 UI 模式跑(適合 debug、"
                        "看 flake 視覺現象);預設 headless 跑、CI / 大量套件用這個。"
                    ),
                },
                "browser": {
                    "type": "string",
                    "enum": ["chromium", "firefox", "webkit"],
                    "default": "chromium",
                    "description": (
                        "選填,僅對 pytest-playwright 有效,指定 Playwright 啟用的 browser engine。"
                        "需事先 `playwright install <browser>` 過。"
                    ),
                },
            },
        },
    ),
  • Dispatch handler in server.py's _dispatch function: routes 'run_tests' calls to runner.run_tests() with filter, headed, and browser args.
    if name == "run_tests":
        result = runner.run_tests(
            filter=args.get("filter"),
            headed=args.get("headed", False),
            browser=args.get("browser", "chromium"),
        )
        return [TextContent(type="text", text=json.dumps(result, ensure_ascii=False, indent=2))]
  • Abstract base class defining the run_tests interface that all runners implement.
    class TestRunner(ABC):
        """通用測試 runner 介面。每個測試框架實作一個子類。"""
    
        name: str = "base"
    
        @abstractmethod
        def list_tests(self) -> str: ...
    
        @abstractmethod
        def run_tests(self, filter: str | None = None, **kwargs) -> dict: ...
  • Security guardrail for run_tests filter argument: rejects args starting with '-' (CLI injection) or containing '..' (path traversal).
    def validate_filter(value: object) -> tuple[bool, str | None]:
        """Check a `filter` argument before it reaches subprocess argv.
    
        Pytest / Go / Jest / Cypress all accept fairly free-form filter
        syntax (regex, boolean expressions, globs), so the rule is *minimal*:
        reject the cases that smell like attacks, allow everything else.
    
        Returns (ok, error_message). When ok is True, caller passes the
        original value through unchanged; when False, caller surfaces the
        error string as a tool-result error.
        """
        if value is None or value == "":
            return True, None
        if not isinstance(value, str):
            return False, f"filter must be a string, got {type(value).__name__}"
        if len(value) > _MAX_FILTER_LEN:
            return False, f"filter too long ({len(value)} > {_MAX_FILTER_LEN} chars)"
        if value.startswith("-"):
            # A leading `-` would slot in as a new CLI option to the underlying
            # tool — e.g. `pytest -k --config=evil` parses `--config=evil` as a
            # pytest option, not as part of -k. Block at the boundary.
            return False, (
                "filter cannot start with '-' (looks like a CLI option, "
                "not a test name)"
            )
        if ".." in value:
            # Filters are not paths; '..' in them is almost always a sign that
            # someone is trying to escape an expected scope (cypress turns the
            # filter into a glob, where `..` would walk outside the project).
            return False, "filter cannot contain '..'"
        if _CONTROL_CHARS_RE.search(value):
            return False, "filter contains control characters"
        return True, None
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Since no annotations are provided, the description fully discloses behavior: invokes CLI with flags, outputs report.json/JUnit XML, snapshots history, auto-triggers optimizer, Maestro retry logic, return fields, and edge cases (no matches, timeout, security guardrail). This is comprehensive.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured with clear sections, but it is quite long. It successfully front-loads the purpose and every sentence adds value, but could be slightly more concise without losing information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity of the tool (multiple runners, parameters, edge cases) and no output schema, the description covers all essential aspects: behavior, return shape, when to use/not use, edge cases, and security considerations. It leaves no significant gaps.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Input schema covers all parameters with descriptions. The description adds further meaning: explains filter per framework, clarifies that headed and browser are only for pytest-playwright, and specifies default values and constraints.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description explicitly states the tool's purpose: 'Execute the test suite under the active QA_RUNNER and produce a structured report.' It distinguishes itself from siblings like run_failed and list_tests by noting it is the 'single most-called tool' and contrasts with alternatives in the 'When NOT to use' section.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit guidance with 'When to use' bullets (after writing new test, smoke before release, user run verb) and 'When NOT to use' with specific alternatives (get_test_report, run_failed, list_tests). This helps the agent choose correctly.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/kao273183/mk-qa-master'

If you have feedback or need assistance with the MCP directory API, please join our Discord server