parse_research_paper
Extract text and formulas from scientific PDFs using OCR. Converts tables and multi-column layouts to clean Markdown for further processing.
Instructions
Highly accurate OCR for academic papers and scientific PDFs using Meta's Nougat model. Converts visual structures like tables, formulas, and multi-column layouts into clean Markdown.
Args: file_path (str): The absolute path to the PDF file on the local system. output_format (str): "default" uses settings.json preferences. "mmd" returns raw Nougat output. "md" converts math delimiters for broader Markdown renderer compatibility.
Returns: str: The extracted text in the requested markup format.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| file_path | Yes | ||
| output_format | No | default |
Output Schema
| Name | Required | Description | Default |
|---|---|---|---|
| result | Yes |