verify_action_outcome
Compare an agent's claimed outcome against actual before/after state snapshots to detect misreported actions.
Instructions
v1.1+ — Compare an agent's stated outcome against actual before/after state snapshots. Catches the [@chiefofautism, 158↑] failure mode: agent runs rm -rf / git push --force and then says 'I cleaned up the project structure' — bash-vet catches the destructive command, this checks the misreport about what got done. Also catches the Codex-CoT sandbox-escalation pattern: agent acknowledges read-only constraint, then writes anyway (pass read_only: true in the before snapshot). Pure function — caller captures snapshots; server is stateless. Returns ActionOutcomeReport with verdict (CLEAN / PARTIALLY_GROUNDED / FABRICATED / UNVERIFIED) + per-mismatch evidence.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| claim | Yes | The agent's stated outcome — verbatim. Examples: 'I cleaned up the project structure', 'tests pass', 'committed and pushed', 'created auth_v2.py'. | |
| before_snapshot | Yes | Caller-captured state BEFORE the agent acted. Recognized keys: files (list[str]), git_status, git_tip / git_head / git_log_tip (str SHA), tests_status / test_status, read_only (bool — asserts no-write constraint). Other keys are tracked but not matched against claim. | |
| after_snapshot | Yes | Caller-captured state AFTER the agent acted. Same key conventions as before_snapshot. | |
| expected_changes | No | Optional caller-supplied list of expected changes. Recognized formats: 'file:foo.py:added', 'file:bar.py:removed', 'git:committed', 'git:clean', 'tests:pass'. Each missing entry becomes a MISSING_EXPECTED_CHANGE finding. |
Implementation Reference
- Main handler function that compares an agent's claim against before/after state snapshots, extracts assertions, checks them against the diff, and returns an ActionOutcomeReport with verdict and mismatches.
def verify_action_outcome( claim: str, before_snapshot: Mapping[str, Any], after_snapshot: Mapping[str, Any], expected_changes: list[str] | None = None, ) -> ActionOutcomeReport: """Compare an agent claim against actual before/after state diff. Pure function — does not capture state itself; the caller passes both snapshots. The server stays stateless (same posture as `verify_grounding`). Both snapshots must be Mapping-shaped; non-dict inputs should be coerced by the call site (server.py does this for MCP tool calls). """ diff = _compute_diff(before_snapshot, after_snapshot) diff_summary = _diff_summary(diff) assertions = _extract_claim_assertions(claim) mismatches: list[ActionOutcomeMismatch] = [] matched = 0 mismatched = 0 for kind, target, excerpt in assertions: m = _check_assertion(kind, target, excerpt, diff, before_snapshot, after_snapshot) if m is None: matched += 1 else: mismatched += 1 mismatches.append(m) # Always check constraint violations + caller-supplied expected_changes mismatches.extend(_check_constraint_violations(before_snapshot, diff)) mismatches.extend(_check_expected_changes(expected_changes, diff)) # Sort: CRITICAL → HIGH → MEDIUM → LOW → INFO, then by rule_id severity_rank = { Severity.CRITICAL: 0, Severity.HIGH: 1, Severity.MEDIUM: 2, Severity.LOW: 3, Severity.INFO: 4, } mismatches.sort(key=lambda m: (severity_rank[m.severity], m.rule_id)) # Verdict composition has_critical = any(m.severity == Severity.CRITICAL for m in mismatches) has_high = any(m.severity == Severity.HIGH for m in mismatches) if not assertions and not expected_changes and not _check_constraint_violations(before_snapshot, diff): verdict = Verdict.UNVERIFIED summary = ( "UNVERIFIED — claim has no extractable assertions and no " "expected_changes / constraint were supplied. Provide a more " "specific claim (filename, 'tests pass', 'committed', etc.) or " "pass expected_changes." ) elif mismatched == 0 and not mismatches: verdict = Verdict.CLEAN summary = ( f"CLEAN — claim is supported by the diff. " f"{matched} assertion(s) matched. Diff: {diff_summary}." ) elif has_critical or (has_high and matched == 0): verdict = Verdict.FABRICATED summary = ( f"FABRICATED — claim is contradicted by the diff. " f"{matched} matched / {mismatched} mismatched. " f"Worst: {mismatches[0].rule_id}. Diff: {diff_summary}." ) else: verdict = Verdict.PARTIALLY_GROUNDED summary = ( f"PARTIALLY_GROUNDED — some claim assertions match the diff, others don't. " f"{matched} matched / {mismatched} mismatched. Diff: {diff_summary}." ) return ActionOutcomeReport( verdict=verdict, matched_count=matched, mismatched_count=mismatched, mismatches=mismatches, diff_summary=diff_summary, summary=summary, ) - Output type (ActionOutcomeReport) and input validation via ActionOutcomeMismatch for the verify_action_outcome tool.
class ActionOutcomeReport(BaseModel): """Response for `verify_action_outcome` — compares an agent claim against before/after state diff. The scanner is the next layer below `review_transcript`'s `unverified-completion-claim` check: that one fires when a claim has *no supporting tool calls visible in the transcript*. This one fires when a claim *has* supporting tool calls, but the side effects don't match. """ model_config = ConfigDict(frozen=True) verdict: Verdict """CLEAN if all extracted claims match the diff; PARTIALLY_GROUNDED if some match and some don't; FABRICATED if the diff actively contradicts the claim (state unchanged or violated stated constraint); UNVERIFIED if the claim couldn't be parsed into checkable assertions.""" matched_count: int """Claim assertions that matched the diff.""" mismatched_count: int """Claim assertions that did not match the diff.""" mismatches: list[ActionOutcomeMismatch] """All mismatches, sorted CRITICAL → INFO.""" diff_summary: str """One-line text summary of what changed between the snapshots.""" summary: str - src/openclaw_output_vetter_mcp/server.py:196-254 (registration)MCP tool registration in list_tools() with name 'verify_action_outcome', description, and inputSchema defining claim, before_snapshot, after_snapshot (required) and expected_changes (optional).
Tool( name="verify_action_outcome", description=( "v1.1+ — Compare an agent's stated outcome against actual " "before/after state snapshots. Catches the [@chiefofautism, 158↑] " "failure mode: agent runs `rm -rf` / `git push --force` and then " "says 'I cleaned up the project structure' — bash-vet catches the " "destructive command, this checks the *misreport* about what got " "done. Also catches the Codex-CoT sandbox-escalation pattern: " "agent acknowledges read-only constraint, then writes anyway " "(pass `read_only: true` in the before snapshot). Pure function — " "caller captures snapshots; server is stateless. Returns " "ActionOutcomeReport with verdict (CLEAN / PARTIALLY_GROUNDED / " "FABRICATED / UNVERIFIED) + per-mismatch evidence." ), inputSchema={ "type": "object", "properties": { "claim": { "type": "string", "description": ( "The agent's stated outcome — verbatim. Examples: " "'I cleaned up the project structure', 'tests pass', " "'committed and pushed', 'created auth_v2.py'." ), }, "before_snapshot": { "type": "object", "description": ( "Caller-captured state BEFORE the agent acted. " "Recognized keys: files (list[str]), git_status, " "git_tip / git_head / git_log_tip (str SHA), " "tests_status / test_status, read_only (bool — " "asserts no-write constraint). Other keys are " "tracked but not matched against claim." ), }, "after_snapshot": { "type": "object", "description": ( "Caller-captured state AFTER the agent acted. Same " "key conventions as before_snapshot." ), }, "expected_changes": { "type": "array", "description": ( "Optional caller-supplied list of expected changes. " "Recognized formats: 'file:foo.py:added', " "'file:bar.py:removed', 'git:committed', " "'git:clean', 'tests:pass'. Each missing entry " "becomes a MISSING_EXPECTED_CHANGE finding." ), "items": {"type": "string"}, }, }, "required": ["claim", "before_snapshot", "after_snapshot"], }, ), - src/openclaw_output_vetter_mcp/server.py:293-304 (registration)Tool dispatch in call_tool(): routes the 'verify_action_outcome' tool name to the verify_action_outcome handler with argument extraction and type coercion.
if name == "verify_action_outcome": claim = str(arguments.get("claim", "")).strip() before = arguments.get("before_snapshot") after = arguments.get("after_snapshot") expected = arguments.get("expected_changes") if not isinstance(before, dict): before = {} if not isinstance(after, dict): after = {} if expected is not None and not isinstance(expected, list): expected = None return _serialize(verify_action_outcome(claim, before, after, expected_changes=expected)) - Claim extraction helper that parses agent claim text using regex patterns (created_file, deleted_file, tests_pass, committed, clean_state, vague_completion) and supports multi-target expansion for chained filenames.
def _extract_claim_assertions(claim: str) -> list[tuple[str, str | None, str]]: """Parse claim text into list of (kind, target_or_none, claim_excerpt) tuples.""" if not claim or not claim.strip(): return [] assertions: list[tuple[str, str | None, str]] = [] seen_specific = False for kind, pattern in _CLAIM_PATTERNS: for m in pattern.finditer(claim): target = m.group(1) if m.groups() else None excerpt = claim[max(0, m.start() - 10) : min(len(claim), m.end() + 30)].strip() if len(excerpt) > 200: excerpt = excerpt[:200] + "..." if kind == "vague_completion" and seen_specific: # Skip vague matches if we already have specific ones continue assertions.append((kind, target, excerpt)) if kind not in ("vague_completion",): seen_specific = True # Multi-target expansion (v1.2+): for file-creation/deletion claims, # scan the text immediately after the matched span for chained # filenames connected by ", " / " and " / ", and ". This catches # "Created auth.py and helpers.py" / "Removed old.py, legacy.py". if kind in ("created_file", "created_file_terse", "deleted_file") and target: tail_start = m.end() # Bound the tail at the next sentence boundary so we don't drag # filenames from later sentences into this assertion's scope. sentence_end = _next_sentence_boundary(claim, tail_start) tail = claim[tail_start:sentence_end] for fm in _MULTI_TARGET_FOLLOWUP.finditer(tail): chained_target = fm.group(1) if chained_target == target: continue chained_excerpt = ( f"{excerpt} (chained: '{chained_target}')" if len(excerpt) + len(chained_target) + 14 <= 200 else excerpt ) assertions.append((kind, chained_target, chained_excerpt)) # Dedupe identical (kind, target) pairs while preserving order seen: set[tuple[str, str | None]] = set() unique: list[tuple[str, str | None, str]] = [] for a in assertions: key = (a[0], a[1]) if key in seen: continue seen.add(key) unique.append(a) return unique