verify_action_outcome
Compare an agent's claimed outcome against actual before/after state snapshots to detect misreported actions.
Instructions
v1.1+ — Compare an agent's stated outcome against actual before/after state snapshots. Catches the [@chiefofautism, 158↑] failure mode: agent runs rm -rf / git push --force and then says 'I cleaned up the project structure' — bash-vet catches the destructive command, this checks the misreport about what got done. Also catches the Codex-CoT sandbox-escalation pattern: agent acknowledges read-only constraint, then writes anyway (pass read_only: true in the before snapshot). Pure function — caller captures snapshots; server is stateless. Returns ActionOutcomeReport with verdict (CLEAN / PARTIALLY_GROUNDED / FABRICATED / UNVERIFIED) + per-mismatch evidence.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| claim | Yes | The agent's stated outcome — verbatim. Examples: 'I cleaned up the project structure', 'tests pass', 'committed and pushed', 'created auth_v2.py'. | |
| before_snapshot | Yes | Caller-captured state BEFORE the agent acted. Recognized keys: files (list[str]), git_status, git_tip / git_head / git_log_tip (str SHA), tests_status / test_status, read_only (bool — asserts no-write constraint). Other keys are tracked but not matched against claim. | |
| after_snapshot | Yes | Caller-captured state AFTER the agent acted. Same key conventions as before_snapshot. | |
| expected_changes | No | Optional caller-supplied list of expected changes. Recognized formats: 'file:foo.py:added', 'file:bar.py:removed', 'git:committed', 'git:clean', 'tests:pass'. Each missing entry becomes a MISSING_EXPECTED_CHANGE finding. |