qa_compare_runs
Compare baseline and newer test runs to categorize regressions, fixes, persistent failures, and new or removed tests.
Instructions
Compare two test runs and categorize the differences.
`run_a` is treated as baseline (older), `run_b` as newer.
Categories returned:
regressions passed in A, failed/error in B (highest priority)
fixes failed/error in A, passed in B
persistent_failures failed in both
same_error fingerprints match → same root cause
different_error fingerprints differ → root cause changed
new_tests in B but not A
removed_tests in A but not B
other_changes transitions involving skipped (low priority)
Flakiness detection requires N>2 runs and is not in this tool. Use the
weekly_regression_review prompt to orchestrate multi-run analysis.
Returns:
Markdown summary or JSON of the full ComparisonResult model.
Error response: string starting with "Error: ...".
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| run_a | Yes | Baseline run_id (treated as 'before'). | |
| run_b | Yes | Newer run_id (treated as 'after'). | |
| response_format | No | 'markdown' for human-readable, 'json' for programmatic. | markdown |
Output Schema
| Name | Required | Description | Default |
|---|---|---|---|
| result | Yes |