tailtest_classify_failures
Parse test runner output and classify failures into real bug, environment, test bug, or unknown. Returns structured records with type, reason, and summary counts.
Instructions
Parse runner output (pytest, jest, etc.) into structured failure records and apply heuristic R12 classification. Returns failures with type (real_bug / environment / test_bug / unknown), reason, test name, file, line, error type, message, and a summary count per R12 category. The agent verifies or overrides the heuristic when context warrants.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| runner_output | Yes | Stdout (and optionally stderr) from the test runner. | |
| runner | No | Runner name. Defaults to pytest. |
Implementation Reference
- Tool registration with input schema: defines name, description, input properties (runner_output required, runner optional with enum pytest/jest/vitest/mocha).
Tool( name="tailtest_classify_failures", description=( "Parse runner output (pytest, jest, etc.) into structured failure records and " "apply heuristic R12 classification. Returns failures with type " "(real_bug / environment / test_bug / unknown), reason, test name, file, " "line, error type, message, and a summary count per R12 category. The agent " "verifies or overrides the heuristic when context warrants." ), inputSchema={ "type": "object", "properties": { "runner_output": { "type": "string", "description": "Stdout (and optionally stderr) from the test runner.", }, "runner": { "type": "string", "enum": ["pytest", "jest", "vitest", "mocha"], "description": "Runner name. Defaults to pytest.", }, }, "required": ["runner_output"], "additionalProperties": False, }, - mcp_server/src/tailtest_mcp/server.py:166-174 (registration)Tool dispatch/call handler that imports classify_failures from tools module and delegates execution.
if name == "tailtest_classify_failures": from .tools.classify_failures import classify_failures import json as _json result = classify_failures( runner_output=arguments["runner_output"], runner=arguments.get("runner", "pytest"), ) return [TextContent(type="text", text=_json.dumps(result, indent=2))] - Main handler function `classify_failures`: parses runner output using pytest or jest parsers, applies heuristic R12 classification, returns structured failure records with summary counts.
def classify_failures(runner_output: str, runner: str = "pytest") -> dict[str, Any]: """Parse runner output and return structured R12-classified failures. Args: runner_output: stdout (and optionally stderr) from the test runner. runner: one of "pytest", "jest", "mocha", "vitest". Defaults to pytest. Returns: Dict with `failures` (list of failure records), `summary` (counts per R12 type), and `runner` echoed back for the agent's reference. Each failure record: { "type": "real_bug" | "environment" | "test_bug" | "unknown", "reason": str, "test_name": str, "file": str, "line": int | None, "error_type": str, "message": str, } """ if runner in ("jest", "vitest"): failures = _parse_jest_failures(runner_output) else: failures = _parse_pytest_failures(runner_output) summary = {"real_bug": 0, "environment": 0, "test_bug": 0, "unknown": 0} for f in failures: summary[f["type"]] += 1 return { "runner": runner, "failures": failures, "summary": summary, "total_failures": len(failures), } - Heuristic R12 classification logic: maps error types to real_bug, environment, test_bug, or unknown based on error type patterns and traceback analysis.
def _heuristic_classification( error_type: str, message: str, traceback_text: str ) -> tuple[str, str]: """Apply heuristic R12 classification. Returns (classification, reason) where classification is one of: real_bug, environment, test_bug, unknown. """ if error_type in ENV_ERRORS: return ("environment", f"{error_type} typically indicates a missing dependency or system resource") if error_type in LIKELY_REAL_BUG_ERRORS: # Refine: if the traceback shows the error originated in test fixture # setup, flip to test_bug. if traceback_text and any( marker in traceback_text for marker in ("conftest.py", "fixture", "setup_method", "setUp(") ): return ( "test_bug", f"{error_type} originated in test fixture or setup, not source under test", ) return ( "real_bug", f"{error_type} typically indicates a bug in the source under test", ) if error_type in AMBIGUOUS_ERRORS: # AssertionError: try to disambiguate from message. msg_lower = (message or "").lower() # Common test_bug signals if any( phrase in msg_lower for phrase in ( "fixture not found", "expected fixture", "wrong expectation", "stub", "mock not configured", ) ): return ("test_bug", "Assertion message indicates the test setup is wrong") # Common real_bug signals if any( phrase in msg_lower for phrase in ( "expected ", "got ", "should", "to equal", "to be", ) ): return ( "real_bug", "Assertion compares actual vs expected behavior of the source", ) return ( "real_bug", "AssertionError defaults to real_bug when ambiguous (per CLAUDE.md / mdc rule)", ) return ("unknown", f"No heuristic for {error_type}; agent must classify") - Pytest output parser: extracts failures from pytest stdout/stderr using regex on FAILED short-form summary lines, applies heuristic classification, and builds structured failure records.
def _parse_pytest_failures(output: str) -> list[dict[str, Any]]: """Extract failures from pytest stdout / stderr. Recognizes the FAILED short-form summary as the primary signal. Returns one record per failure. """ failures: list[dict[str, Any]] = [] seen: set[tuple[str, str]] = set() for match in _PYTEST_SHORT_FAILURE_RE.finditer(output): file_path = match.group("file") test_name = match.group("test") key = (file_path, test_name) if key in seen: continue seen.add(key) error_type = match.group("error_type") or "" message = match.group("message") or "" # Look for a traceback block following the test name elsewhere in output traceback_text = _find_traceback_for(output, test_name) classification, reason = _heuristic_classification( error_type, message, traceback_text ) failures.append( { "type": classification, "reason": reason, "test_name": test_name, "file": file_path, "line": _find_failure_line(traceback_text or "", file_path), "error_type": error_type, "message": message, } ) return failures