Skip to main content
Glama

run_tests

Execute test suites under the QA runner and generate structured reports with exit codes, logs, and flaky detection. Supports optional filter to target specific tests across pytest, Jest, Cypress, Go, and Maestro.

Instructions

Execute the test suite under the active QA_RUNNER and produce a structured report. The single most-called tool — invoke whenever a user says 「跑/run/test/check/驗證/執行」, after generate_test (verify new test), or after a fix (confirm bug gone).

Behavior:

  • Invokes the runner's native CLI under QA_PROJECT_ROOT — pytest with --screenshot=on / --tracing=on / --video=retain-on-failure, or npx jest --json, npx cypress run --reporter json, go test -json, maestro test --format junit

  • Optional filter narrows the scope: pytest -k expr, jest -t pattern, cypress --spec glob, go -run regex, maestro flow-name substring

  • Writes report.json (pytest-json-report shape, runner-agnostic) + JUnit XML

  • Snapshots the run into history/ and auto-triggers optimizer.write_plan() → optimization-plan.md is refreshed

  • Maestro: auto-retries flows that failed on first attempt (MAESTRO_RETRY=true), surfaces flaky_in_run count Returns: {exit_code, raw_exit_code, stdout_tail, stderr_tail, retry_enabled, flaky_in_run, ...}

When to use:

  • After writing a new test → verify it actually passes

  • Smoke before a release

  • Whenever the user prompt contains a run/test verb

When NOT to use:

  • Inspecting last results without re-running → use get_test_report (cheaper)

  • Re-running only failed cases → use run_failed (way faster)

  • Enumerating which tests exist → use list_tests

Edge cases:

  • No tests match filter → exit_code != 0 with 「no tests ran」 in stderr_tail

  • QA_TIMEOUT_SECONDS exceeded → exit_code 124 + [TIMEOUT…] tag in stderr_tail

  • filter starting with - or containing .. → blocked by security guardrail, returns {error: …}

Plan bookend (v0.10.0): pass plan_id from a prior qa_plan call and the response auto-attaches plan_verification — the critical points are checked against the just-written report.json via the same flow run_api_security_scan uses. Omit plan_id to keep the legacy shape (no plan_verification key). When verify_plan fails (unknown / expired plan_id), the run still succeeds; the error envelope is surfaced under plan_verification.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
filterNo選填,測試名稱關鍵字。pytest 走 -k 表達式(支援 and/or/not)、Jest 走 -t、Cypress 走 --spec '**/*<filter>*'、Go 走 -run regex、Maestro 在 flow 檔名作子字串比對。
headedNo選填,僅對 pytest-playwright 有效。True 時瀏覽器有 UI 模式跑(適合 debug、看 flake 視覺現象);預設 headless 跑、CI / 大量套件用這個。
browserNo選填,僅對 pytest-playwright 有效,指定 Playwright 啟用的 browser engine。需事先 `playwright install <browser>` 過。chromium
plan_idNo選填,v0.10.0+。Plan id returned by qa_plan. When supplied, the response gains a `plan_verification` envelope that checks every critical point against the just-written report.json. Same shape as run_api_security_scan's plan bookend.
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden. It details the invocation (pytest with screenshot/tracing/video flags, jest/cypress/maestro commands), output files (report.json, JUnit XML), side effects (snapshot to history, auto-trigger optimizer.write_plan), Maestro auto-retries, and edge cases (no match, timeout, security guardrail). This is comprehensive.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is structured with labeled sections (Behavior, When to use, Edge cases, Plan bookend) and front-loaded with the core purpose. While it is somewhat lengthy, every section earns its place by providing necessary detail. Minor redundancy in the 'When to use' list but overall efficient.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no annotations or output schema, the description covers all critical aspects: behavior, return shape (exit_code, stdout_tail, etc.), side effects, edge cases, and the plan bookend feature. It references sibling tools and explains when to use alternatives. The description is complete for a complex tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so baseline is 3. The description adds value by explaining how 'filter' works per runner (pytest -k, jest -t, etc.), that 'headed' is only for pytest-playwright, 'browser' requires pre-installation, and 'plan_id' ties to qa_plan with response shape change. It enriches the schema without redundancy.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool executes the test suite and produces a structured report. It identifies itself as the 'single most-called tool' for running tests and lists triggers like 'run/test/check/驗證/執行'. It distinguishes the primary action (execute) from related tools like get_test_report and run_failed.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly states when to use (after writing a new test, smoke before release, when user prompt contains run/test verbs) and when NOT to use (inspect results without re-running → get_test_report, re-run only failures → run_failed, list tests → list_tests). This provides clear alternatives and context.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/kao273183/mk-qa-master'

If you have feedback or need assistance with the MCP directory API, please join our Discord server