Convert PDF to text representation
convert_pdfConvert PDFs to structured text for LLM ingestion: markdown, token chunks, or heading-aware semantic chunks for retrieval.
Instructions
Convert a whole PDF into a text representation for downstream LLM use.
Returns JSON: {content, format} for 'markdown', or {chunks, format} for 'chunks'/'rag' (each chunk carries its index and page_numbers; rag chunks add token_estimate and heading_context). Read-only.
Use this when you need structure or chunking. If you just want the raw reading text use extract_text; for per-run coordinates use extract_entities.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| path | Yes | Path to the PDF file, relative to the configured workspace. | |
| format | Yes | Output representation: 'markdown' = one structured Markdown document; 'chunks' = fixed-size token windows; 'rag' = heading-aware semantic chunks for retrieval pipelines. | |
| password | No | User password to unlock an encrypted PDF before conversion. | |
| max_tokens | No | Target maximum tokens per chunk. Applies to format='chunks' and 'rag' only; ignored for 'markdown'. | |
| overlap | No | Token overlap carried between consecutive chunks. Applies to format='chunks' only; ignored for 'markdown' and 'rag'. |
Output Schema
| Name | Required | Description | Default |
|---|---|---|---|
| result | Yes |