raw_read
Read raw source documents with automatic text extraction for PDF, DOCX, XLSX, PPTX, and plain text. Paginate by pages, sheet, or line offset.
Instructions
Read a raw source document's content and metadata. Raw files are immutable — this is read-only. Text/SVG files return content as string; document files (PDF, DOCX, XLSX, PPTX) have text extracted automatically; other binary files (images, etc.) return metadata only.
Pagination by format:
PDF: use 'pages' for page ranges (e.g. '1-5')
PPTX: use 'pages' for slide ranges (e.g. '1-10')
XLSX: use 'sheet' to read a specific sheet; response always includes 'sheet_names'
DOCX / text: use 'offset' + 'limit' for line-based pagination (default limit: 200)
For large documents, paginate rather than reading all at once.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| filename | Yes | Filename relative to raw/ (e.g. 'article-yolo.md') | |
| pages | No | Page/slide range (e.g. '1-5', '3', '1-3,7-10'). Applies to PDF and PPTX. Omit to read all. | |
| sheet | No | Sheet name for XLSX files. Omit to read all sheets (response always includes sheet_names list). | |
| offset | No | Line offset for paginating text/DOCX files. Default: 0. | |
| limit | No | Max lines to return for text/DOCX pagination. Default: 200, max: 500. |