Judgment Precision Report
session_judgment_precision_reportCompute precision, recall, and F1 scores for shadow judge decisions by correlating with subsequent evidence resurfacing. Use results to determine whether to activate a peer from shadow mode.
Instructions
v2.14.0 — compute precision/recall/F1 of the shadow judge against the empirical ground truth (whether peers raised the same ask in a subsequent round). Walks session.evidence_judge_pass.shadow_decision events across all sessions (or a single session via session_id, or filtered by judge peer / since timestamp), correlates each decision with the subsequent evidence_checklist resurfacing behavior, and returns per-peer TP/FP/TN/FN counts plus precision/recall/F1. Decisions whose item.last_round equals the judge round AND no later round exists are excluded as 'no ground truth' (we cannot tell if the ask would have come back). Operator uses this to decide whether to flip a peer from shadow to active mode (item 2 / v2.13).
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| peer | No | ||
| since | No | ||
| session_id | No | ||
| response_format | No | json |