read_doc
Read local files or HTTP(S) documents (PDF, DOCX, text) into Markdown. Uses character offset pagination for long content. Avoid raw HTML; use fetch for web pages.
Instructions
Read a local file or http(s) document into Markdown.
Best for:
- Local or remote PDFs and DOCX (parsed locally, no remote API).
- Local text/HTML/Markdown files the user pointed at.
- Paginating through a long document via `start` / `length`.
Not recommended for:
- Arbitrary HTML web pages -> `fetch` does reader-mode cleanup that this
tool does not.
- Pages discovered through search -> `fetch` or `research`.
Returns:
- markdown (default): rendered document text with a small header.
- json: {content, title, format, total_chars, start, returned_chars,
truncated}. Use `total_chars` and `returned_chars` to drive pagination.
Common mistakes:
- Calling this on a normal article URL — you'll get raw HTML noise; use
`fetch` instead.
- Forgetting to advance `start` when paginating: next call should pass
`start = previous_start + returned_chars`.
Args:
source: Local path (e.g. "~/papers/x.pdf") or http(s) URL.
start: Character offset to begin reading from. Default 0.
length: Max characters to return; None = read to end (still capped
by per-call max content size).
format: "markdown" or "json".Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| source | Yes | ||
| start | No | ||
| length | No | ||
| format | No | markdown |
Output Schema
| Name | Required | Description | Default |
|---|---|---|---|
| result | Yes |