Extract structured fields from papers
extract_from_papersExtract structured tables from up to 50 papers by defining custom fields and instructions. Get one typed row per paper for consistent fact extraction across multiple papers.
Instructions
Pull a structured table out of up to 50 papers in ONE call: you define the columns (fields: each a snake_case name, a type, and an optional description) and an instruction, and the server reads each paper's full text and returns one typed row per paper. Use this when you need the SAME facts across many papers, e.g. "dataset, model size, and reported accuracy for each of these papers", instead of reading each full text yourself and transcribing by hand. Pass sections (case-insensitive headings, e.g. ["Results"]) to focus extraction and cut noise. The model is instructed to use only what each paper states, not to infer; a field it can't ground may be absent or null. Each row carries truncated (true when the paper's text overflowed the budget and the tail was dropped, so treat it as partial). A paper with no parsed full text, or one the model couldn't extract, is reported in papers_failed (with a reason) instead of sinking the batch, so papers_processed == rows + failures. Heavy: one model call per paper, so extract only papers you already judged relevant from a search or citation result. For the raw text of a single paper, use get_paper_fulltext instead.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| fields | Yes | 1 to 12 fields to extract per paper. Each becomes a typed column on every row, keyed by its `name`. | |
| sections | No | Restrict extraction to these sections (case-insensitive heading match), e.g. ["Results", "Experiments"]. Omit to consider the whole paper. | |
| paper_ids | Yes | 1 to 50 Lune paper UUIDs to extract from in ONE call. Take them from a `search_papers` / `search_papers_many` / `search_related_papers` / `get_paper_citations` result. | |
| instruction | Yes | Natural-language guidance for the extraction (e.g. "Pull the primary evaluation dataset and the headline accuracy"). The model is told to use only what the paper states. |
Output Schema
| Name | Required | Description | Default |
|---|---|---|---|
| rows | Yes | One row per successfully extracted paper. | |
| papers_failed | Yes | Papers that yielded no row; recorded here instead of sinking the batch. Empty when every paper extracted. | |
| papers_processed | Yes | Total papers attempted; equals rows.length + papers_failed.length. |