read_docx
Extract text content from DOCX files, preserving tracked changes and comments. Navigate large documents with outline mode for headings or paginated full view.
Instructions
Reads a DOCX file and extracts its text content. Use this to ingest documents into your context window. By default (clean_view=False), it returns text with inline CriticMarkup (e.g., {++inserted++}, {--deleted--}, {==highlighted==}{>>comment<<}) representing Tracked Changes and Comments. Set clean_view=True ONLY if you want to read the final, clean text, ignoring all redlines and comments.
PAGINATION & OUTLINE:
mode='outline' returns a structural map of headings with page numbers, styles, table presence, and referenced footnotes. Body content is omitted. Use this first on large documents to plan targeted reads.
mode='full' (default) returns the document body. Documents over ~19,000 characters are split into pages; use page=N to read a specific page (1-indexed). Documents under the limit are returned in full on page 1.
Page boundaries differ between clean_view=True and clean_view=False.
The Structural Appendix (defined terms, anchors, diagnostics) is repeated on every page.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| file_path | Yes | Absolute path to the DOCX file. | |
| clean_view | No | If False (default), returns the 'Raw' text with inline CriticMarkup. If True, returns 'Accepted' text. | |
| mode | No | 'full' returns body content (paginated for large docs). 'outline' returns a structural heading map with page numbers; body content is omitted. | full |
| page | No | Page number (1-indexed) for mode='full'. Defaults to 1. Ignored when mode='outline'. |