Gather evidence and judge sufficiency
gather_evidenceChecks gathered evidence sufficiency for a research task, identifies missing evidence, and suggests next search queries using verbatim quotes from paper searches.
Instructions
Use for a multi-part research task when you need to know whether your gathered evidence is SUFFICIENT, what is still MISSING, and what to search next, without the tool writing the answer. Pass the goal in task and your first search angles in queries; the server runs one corpus search per angle, decomposes the task into evidence requirements (or use your own via requirements), and returns each requirement as covered / partial / missing with the exact evidence_spans (verbatim quotes) that support it, plus next_queries for the gaps. Default max_iterations=1 is a one-shot assessment billed len(queries); set max_iterations>1 AND max_total_queries>len(queries) to authorize bounded server-side follow-up searches (billed max_total_queries, capped at 25). Optionally pass a draft to get per-sentence support checks against the gathered spans. Every covered requirement and supported draft sentence carries a verbatim quote verified server-side, so you can cite it directly. You write the answer; cite papers by title, authors, and venue, not by paper_id.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| task | Yes | The research goal in prose: what you are trying to establish. Drives requirement decomposition and the sufficiency judgment. | |
| year | No | ||
| draft | No | Optional current draft. Each sentence is checked for support against the gathered spans (no extra searches). The tool never rewrites your draft. | |
| venues | No | Restrict to these conference short names. | |
| queries | Yes | Your initial search angles (full natural-language questions). One corpus search runs per angle; they are billed like search_papers_many. | |
| year_max | No | ||
| year_min | No | ||
| conference | No | Filter to this conference short name, e.g. "NeurIPS". | |
| requirements | No | Optional explicit evidence slots; omit to let the server derive them from `task`. | |
| max_iterations | Yes | Sufficiency rounds. Default 1 is a one-shot advisor. Set >1 (with max_total_queries>len(queries)) to authorize bounded server-side follow-up searches. | |
| max_total_queries | No | Total search budget across all iterations (the billed ceiling). Defaults to len(queries). Must exceed len(queries) only when max_iterations>1. |
Output Schema
| Name | Required | Description | Default |
|---|---|---|---|
| queries_run | Yes | Actual searches run (<= units_charged). | |
| stop_reason | Yes | ||
| next_queries | Yes | Suggested follow-up search angles for partial / missing requirements. | |
| requirements | Yes | One coverage row per requirement: covered / partial / missing. | |
| draft_support | Yes | Per-sentence support for a supplied draft, or null when no draft was sent. | |
| units_charged | Yes | Billed ceiling (max_total_queries, default len(queries), cap 25). | |
| evidence_spans | Yes | The spans the judge evaluated; every supporting_span_id points here. | |
| iterations_run | Yes | ||
| queries_failed | Yes | Per-query failures (non-CircuitBreaker); a systemic outage 503s instead. |