qa_plan
Store critical-points checklist for a QA task before executing. Later, verify each point with evidence to get pass/fail verdict.
Instructions
v0.9.1 — Store a critical-points checklist before acting on a QA task. The host LLM declares what success looks like (test passes, scan finds X, screenshot shows Y), this tool stores it, returns a plan_id. Later, call verify_plan with evidence (test result rows, scan findings, log lines, screenshot paths) and get a per-CP pass/fail verdict. Inspired by microsoft/Webwright's plan.md pattern: declaring success criteria up-front makes the verifier honest about whether the work was done.
Plans live 30 minutes (cache TTL) in memory and are LRU-bounded at 50 outstanding.
v0.9.3 — disk persistence: when QA_PROJECT_ROOT is set (or QA_PLAN_PERSIST=true), the plan is also dumped atomically to <QA_PROJECT_ROOT>/test-results/plans/<plan_id>.json. verify_plan transparently falls back to disk on in-memory misses, so plans survive process restarts and cache eviction. Expiry is still honored on disk reads — a TTL'd plan won't silently reload. Persistence is best-effort: filesystem errors never raise into the caller.
Returns: {plan_id (12 hex chars), task, kind, critical_points [{id, description, verification_hint}], created_at, expires_at, persisted_to (filesystem path or null when persistence is off)}.
Error shapes: no_task / no_critical_points / bad_critical_points (duplicate id, missing description, wrong type) / bad_kind.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| task | Yes | Required. The natural-language goal — what the user wants done. Will be echoed back in verify_plan's output. | |
| critical_points | Yes | Required, non-empty. Each entry is either a string (used as description+verification_hint) or a dict {id?, description, verification_hint?}. IDs auto-assigned as CP1..CPn if omitted. verification_hint defaults to description — pick a substring that will literally appear in the evidence you'll later pass. | |
| kind | No | Optional. Hint for downstream verifiers about which evidence stream to expect. Omit if unsure. |