Skip to main content
Glama

qa_plan

Store critical-points checklist for a QA task before executing. Later, verify each point with evidence to get pass/fail verdict.

Instructions

v0.9.1 — Store a critical-points checklist before acting on a QA task. The host LLM declares what success looks like (test passes, scan finds X, screenshot shows Y), this tool stores it, returns a plan_id. Later, call verify_plan with evidence (test result rows, scan findings, log lines, screenshot paths) and get a per-CP pass/fail verdict. Inspired by microsoft/Webwright's plan.md pattern: declaring success criteria up-front makes the verifier honest about whether the work was done.

Plans live 30 minutes (cache TTL) in memory and are LRU-bounded at 50 outstanding.

v0.9.3 — disk persistence: when QA_PROJECT_ROOT is set (or QA_PLAN_PERSIST=true), the plan is also dumped atomically to <QA_PROJECT_ROOT>/test-results/plans/<plan_id>.json. verify_plan transparently falls back to disk on in-memory misses, so plans survive process restarts and cache eviction. Expiry is still honored on disk reads — a TTL'd plan won't silently reload. Persistence is best-effort: filesystem errors never raise into the caller.

Returns: {plan_id (12 hex chars), task, kind, critical_points [{id, description, verification_hint}], created_at, expires_at, persisted_to (filesystem path or null when persistence is off)}.

Error shapes: no_task / no_critical_points / bad_critical_points (duplicate id, missing description, wrong type) / bad_kind.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
taskYesRequired. The natural-language goal — what the user wants done. Will be echoed back in verify_plan's output.
critical_pointsYesRequired, non-empty. Each entry is either a string (used as description+verification_hint) or a dict {id?, description, verification_hint?}. IDs auto-assigned as CP1..CPn if omitted. verification_hint defaults to description — pick a substring that will literally appear in the evidence you'll later pass.
kindNoOptional. Hint for downstream verifiers about which evidence stream to expect. Omit if unsure.
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description fully discloses behavioral traits: cache TTL of 30 minutes, LRU bound of 50 plans, disk persistence behavior, error shapes, and version history. This provides comprehensive transparency beyond what structured fields could offer.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is detailed but well-structured, starting with the primary purpose, then usage pattern, followed by technical details. It could be slightly more concise, but every sentence adds value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description fully covers all aspects: purpose, usage, parameters, return value, error shapes, persistence, and expiry. It also links to the sibling tool verify_plan, making it complete for a tool of this complexity.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, but the description adds extra meaning: auto-assignment of IDs for critical_points, defaulting verification_hint to description, and the purpose of the 'kind' enum. This enhances understanding beyond the schema alone.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Store a critical-points checklist before acting on a QA task.' It specifies the return of a plan_id and outlines the complementary workflow with verify_plan, distinguishing it from sibling tools like verify_plan.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explains when to use the tool (before acting on a QA task) and how it pairs with verify_plan. It does not explicitly list exclusions or alternatives, but the context is clear enough for an agent to understand proper usage.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/kao273183/mk-qa-master'

If you have feedback or need assistance with the MCP directory API, please join our Discord server