tailtest_classify_failures
Parse test runner output into structured failure records with R12 classification (real_bug, environment, test_bug, unknown). Returns detailed failure info and summary counts per category.
Instructions
Parse runner output (pytest, jest, etc.) into structured failure records and apply heuristic R12 classification. Returns failures with type (real_bug / environment / test_bug / unknown), reason, test name, file, line, error type, message, and a summary count per R12 category. The agent verifies or overrides the heuristic when context warrants.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| runner_output | Yes | Stdout (and optionally stderr) from the test runner. | |
| runner | No | Runner name. Defaults to pytest. |
Implementation Reference
- The main handler function 'classify_failures' that parses runner output and returns structured R12-classified failure records. Orchestrates parsing via _parse_pytest_failures or _parse_jest_failures.
def classify_failures(runner_output: str, runner: str = "pytest") -> dict[str, Any]: """Parse runner output and return structured R12-classified failures. Args: runner_output: stdout (and optionally stderr) from the test runner. runner: one of "pytest", "jest", "mocha", "vitest". Defaults to pytest. Returns: Dict with `failures` (list of failure records), `summary` (counts per R12 type), and `runner` echoed back for the agent's reference. Each failure record: { "type": "real_bug" | "environment" | "test_bug" | "unknown", "reason": str, "test_name": str, "file": str, "line": int | None, "error_type": str, "message": str, } """ if runner in ("jest", "vitest"): failures = _parse_jest_failures(runner_output) else: failures = _parse_pytest_failures(runner_output) summary = {"real_bug": 0, "environment": 0, "test_bug": 0, "unknown": 0} for f in failures: summary[f["type"]] += 1 return { "runner": runner, "failures": failures, "summary": summary, "total_failures": len(failures), } - Heuristic R12 classification logic mapping error types to real_bug/environment/test_bug/unknown. Includes fixture/conftest detection to flip real_bug to test_bug.
def _heuristic_classification( error_type: str, message: str, traceback_text: str ) -> tuple[str, str]: """Apply heuristic R12 classification. Returns (classification, reason) where classification is one of: real_bug, environment, test_bug, unknown. """ if error_type in ENV_ERRORS: return ("environment", f"{error_type} typically indicates a missing dependency or system resource") if error_type in LIKELY_REAL_BUG_ERRORS: # Refine: if the traceback shows the error originated in test fixture # setup, flip to test_bug. if traceback_text and any( marker in traceback_text for marker in ("conftest.py", "fixture", "setup_method", "setUp(") ): return ( "test_bug", f"{error_type} originated in test fixture or setup, not source under test", ) return ( "real_bug", f"{error_type} typically indicates a bug in the source under test", ) if error_type in AMBIGUOUS_ERRORS: # AssertionError: try to disambiguate from message. msg_lower = (message or "").lower() # Common test_bug signals if any( phrase in msg_lower for phrase in ( "fixture not found", "expected fixture", "wrong expectation", "stub", "mock not configured", ) ): return ("test_bug", "Assertion message indicates the test setup is wrong") # Common real_bug signals if any( phrase in msg_lower for phrase in ( "expected ", "got ", "should", "to equal", "to be", ) ): return ( "real_bug", "Assertion compares actual vs expected behavior of the source", ) return ( "real_bug", "AssertionError defaults to real_bug when ambiguous (per CLAUDE.md / mdc rule)", ) return ("unknown", f"No heuristic for {error_type}; agent must classify") - Enumeration sets for error type categorization: ENV_ERRORS (ImportError, ConnectionError, etc.), LIKELY_REAL_BUG_ERRORS (AttributeError, TypeError, etc.), AMBIGUOUS_ERRORS (AssertionError).
ENV_ERRORS = { "ImportError", "ModuleNotFoundError", "ConnectionError", "ConnectionRefusedError", "ConnectionResetError", "TimeoutError", "FileNotFoundError", "PermissionError", "OSError", } LIKELY_REAL_BUG_ERRORS = { "AttributeError", "TypeError", "KeyError", "ValueError", "IndexError", "ZeroDivisionError", "RecursionError", "OverflowError", } # AssertionError is ambiguous: real_bug if assertion is on source behavior, # test_bug if assertion is on test fixture / setup. AMBIGUOUS_ERRORS = { "AssertionError", } - mcp_server/src/tailtest_mcp/server.py:61-86 (registration)Tool registration in the MCP server's list_tools() function: defines name, description, and inputSchema for tailtest_classify_failures.
Tool( name="tailtest_classify_failures", description=( "Parse runner output (pytest, jest, etc.) into structured failure records and " "apply heuristic R12 classification. Returns failures with type " "(real_bug / environment / test_bug / unknown), reason, test name, file, " "line, error type, message, and a summary count per R12 category. The agent " "verifies or overrides the heuristic when context warrants." ), inputSchema={ "type": "object", "properties": { "runner_output": { "type": "string", "description": "Stdout (and optionally stderr) from the test runner.", }, "runner": { "type": "string", "enum": ["pytest", "jest", "vitest", "mocha"], "description": "Runner name. Defaults to pytest.", }, }, "required": ["runner_output"], "additionalProperties": False, }, ), - mcp_server/src/tailtest_mcp/server.py:166-174 (registration)Dispatch handler in call_tool() that imports classify_failures from .tools.classify_failures and invokes it with runner_output and runner arguments.
if name == "tailtest_classify_failures": from .tools.classify_failures import classify_failures import json as _json result = classify_failures( runner_output=arguments["runner_output"], runner=arguments.get("runner", "pytest"), ) return [TextContent(type="text", text=_json.dumps(result, indent=2))]