Skip to main content
Glama

Gather evidence and judge sufficiency

gather_evidence
Read-only

Checks gathered evidence sufficiency for a research task, identifies missing evidence, and suggests next search queries using verbatim quotes from paper searches.

Instructions

Use for a multi-part research task when you need to know whether your gathered evidence is SUFFICIENT, what is still MISSING, and what to search next, without the tool writing the answer. Pass the goal in task and your first search angles in queries; the server runs one corpus search per angle, decomposes the task into evidence requirements (or use your own via requirements), and returns each requirement as covered / partial / missing with the exact evidence_spans (verbatim quotes) that support it, plus next_queries for the gaps. Default max_iterations=1 is a one-shot assessment billed len(queries); set max_iterations>1 AND max_total_queries>len(queries) to authorize bounded server-side follow-up searches (billed max_total_queries, capped at 25). Optionally pass a draft to get per-sentence support checks against the gathered spans. Every covered requirement and supported draft sentence carries a verbatim quote verified server-side, so you can cite it directly. You write the answer; cite papers by title, authors, and venue, not by paper_id.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
taskYesThe research goal in prose: what you are trying to establish. Drives requirement decomposition and the sufficiency judgment.
yearNo
draftNoOptional current draft. Each sentence is checked for support against the gathered spans (no extra searches). The tool never rewrites your draft.
venuesNoRestrict to these conference short names.
queriesYesYour initial search angles (full natural-language questions). One corpus search runs per angle; they are billed like search_papers_many.
year_maxNo
year_minNo
conferenceNoFilter to this conference short name, e.g. "NeurIPS".
requirementsNoOptional explicit evidence slots; omit to let the server derive them from `task`.
max_iterationsYesSufficiency rounds. Default 1 is a one-shot advisor. Set >1 (with max_total_queries>len(queries)) to authorize bounded server-side follow-up searches.
max_total_queriesNoTotal search budget across all iterations (the billed ceiling). Defaults to len(queries). Must exceed len(queries) only when max_iterations>1.

Output Schema

TableJSON Schema
NameRequiredDescriptionDefault
queries_runYesActual searches run (<= units_charged).
stop_reasonYes
next_queriesYesSuggested follow-up search angles for partial / missing requirements.
requirementsYesOne coverage row per requirement: covered / partial / missing.
draft_supportYesPer-sentence support for a supplied draft, or null when no draft was sent.
units_chargedYesBilled ceiling (max_total_queries, default len(queries), cap 25).
evidence_spansYesThe spans the judge evaluated; every supporting_span_id points here.
iterations_runYes
queries_failedYesPer-query failures (non-CircuitBreaker); a systemic outage 503s instead.
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description discloses behavior beyond annotations: it runs one corpus search per angle, decomposes tasks, returns coverage status, evidences spans, next queries, and supports draft checks. It explains billing (len(queries) for one-shot, max_total_queries for iterative) and constraints. Annotations readOnlyHint true is consistent with 'without the tool writing the answer'. No contradiction.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is somewhat lengthy but well-structured: opening with purpose, then parameter explanations, then additional notes on billing and output. It front-loads the core use. Some redundancy (billing mentioned twice) but generally efficient for the complexity (11 parameters). Score 4.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity, the description covers all necessary aspects: what it does, how it works, parameter usage, billing, optional features (requirements, draft), and what returns (coverage status, evidence spans, next queries). The output schema exists, so return values need not be fully detailed, but the description still gives a good overview. Complete and actionable for an AI agent.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 73% schema coverage, the description adds value by explaining parameter semantics (e.g., task drives decomposition, queries each run a search, max_iterations default and behavior, draft usage). It provides context beyond schema descriptions, but the schema already includes basic descriptions. The description enhances understanding, justifying a 4.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: to assess sufficiency of gathered evidence for a multi-part research task, distinguishing it from siblings like search_papers or extract_from_papers. It specifies the action (gather evidence and judge sufficiency) and the resource (evidence), meeting the 5 criteria.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit when-to-use guidance: when needing to know sufficiency, what is missing, and what to search next. It contrasts with not using it for writing the answer, implying the agent writes the answer. It also describes typical workflow and optional parameters. However, it could be more explicit about when not to use (e.g., simple search). Clear but not exhaustive.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/RetrogradeLabs/lune-mcp-server'

If you have feedback or need assistance with the MCP directory API, please join our Discord server