read_doc
Read PDFs, DOCX, and local files into Markdown. Use http(s) URLs or sandboxed local paths, with pagination via start and length.
Instructions
Read an http(s) document (or a sandboxed local file) into Markdown.
Best for:
- Remote PDFs and DOCX from an http(s) URL (parsed locally, no remote API).
- Local PDF/DOCX/text/Markdown files — ONLY when local reads are enabled
(see Security below).
- Paginating through a long document via `start` / `length`.
Not recommended for:
- Arbitrary HTML web pages -> `fetch` does reader-mode cleanup that this
tool does not.
- Pages discovered through search -> `fetch` or `research`.
Security (local files are sandboxed and OFF by default):
- Local-file reads are DISABLED unless the server operator sets the
SEARCH_MCP_DOCUMENT_ROOT env var to a directory. With it unset, a local
path raises a "local file reads are disabled" error — pass an http(s)
URL instead, or ask the operator to enable the sandbox.
- When enabled, `source` must resolve INSIDE that root; relative paths
resolve against the root (not the process CWD) and any `..` traversal
that escapes the root is rejected. `file://` URLs are always rejected.
- Remote http(s) sources are unaffected by this setting.
Returns:
- markdown (default): rendered document text with a small header.
- json: {content, title, format, total_chars, start, returned_chars,
truncated}. Use `total_chars` and `returned_chars` to drive pagination.
Common mistakes:
- Calling this on a normal article URL — you'll get raw HTML noise; use
`fetch` instead.
- Forgetting to advance `start` when paginating: next call should pass
`start = previous_start + returned_chars`.
- Passing a negative `length` (raises an error) or a `start` past the end
(clamped to EOF: you'll get `returned_chars == 0`, `start == total_chars`,
and `truncated == False` — that's the signal you've paged off the end).
Args:
source: http(s) URL, or a local path UNDER SEARCH_MCP_DOCUMENT_ROOT when
local reads are enabled (disabled by default — see Security).
start: Character offset to begin reading from. Default 0. Clamped into
[0, total_chars]; a negative value is treated as 0.
length: Max characters to return; None = read to end (still capped by
the per-call max content size). Must be >= 0 — a negative length
is rejected with a ValueError.
format: "markdown" or "json".
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| source | Yes | ||
| start | No | ||
| length | No | ||
| format | No | markdown |
Output Schema
| Name | Required | Description | Default |
|---|---|---|---|
| result | Yes |