read_file
Extract text content from DOCX or Google Docs files with pagination support for large documents. Output includes formatting tags and supports multiple formats like JSON or plain text.
Instructions
Read document content (DOCX or Google Doc). Output is token-limited (~14k tokens) by default with pagination metadata (has_more, next_offset). Use offset/limit to paginate.
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| file_path | No | Path to the DOCX file. | |
| google_doc_id | No | Google Doc ID or URL (alternative to file_path). Extract from URL: docs.google.com/document/d/{ID}/edit | |
| offset | No | 1-based paragraph offset for pagination. Negative values count from end. | |
| limit | No | Max paragraphs to return. When omitted, output is token-limited to ~14k tokens with pagination. | |
| node_ids | No | ||
| format | No | ||
| show_formatting | No | When true (default), shows inline formatting tags (<b>, <i>, <u>, <highlighting>, <a>). When false, emits plain text with no inline tags. |