docread
Extract text from PDF, DOCX, PPTX, XLSX, and other document files. Filter content by pages, slides, sheets, or sections using range parameters.
Instructions
Read and extract text from a document file (PDF, DOCX, PPTX, XLSX, ODT, ODS, ODP, RTF, EPUB). Returns sections formatted as '=== section_label ===\ntext'. Output auto-truncated at 40000 chars.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| range | No | Filter output to specific sections. Format depends on file type: - PDF: page numbers (e.g. '1-3', '1,3,5-7') - PPTX/ODP: slide numbers (e.g. '2-4') - EPUB: chapter numbers (e.g. '1-5') - XLSX/ODS: sheet name or 1-based index, with optional row range after colon (e.g. '1', 'Sheet1', '1:1-100', 'Revenue:50-200') - DOCX/ODT/RTF: line numbers (e.g. '1-50', '100-200') | |
| filepath | Yes | Path to the document file. |
Output Schema
| Name | Required | Description | Default |
|---|---|---|---|
| result | Yes |