Extract plain text
extract_textExtract raw, unformatted text from a PDF file. Optionally target a specific page. Returns plain text and page count as JSON.
Instructions
Extract the raw, unformatted text of a PDF as a single string.
Returns JSON {text, page_count} (plus page when a specific page was
requested). Read-only.
Use this when you want the plain reading text. If you need Markdown structure or chunking for LLM/RAG pipelines use convert_pdf; if you need each text run with its on-page coordinates and font use extract_entities.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| path | Yes | Path to the PDF file, relative to the configured workspace. | |
| page | No | 0-based page index to extract. Omit to extract every page joined by newlines. Out-of-range indices return an error. | |
| password | No | User password to unlock an encrypted PDF before extraction. |
Output Schema
| Name | Required | Description | Default |
|---|---|---|---|
| result | Yes |