Skip to main content
Glama

read_doc

Read-onlyIdempotent

Read PDFs, DOCX, and local files into Markdown. Use http(s) URLs or sandboxed local paths, with pagination via start and length.

Instructions

Read an http(s) document (or a sandboxed local file) into Markdown.

Best for:
- Remote PDFs and DOCX from an http(s) URL (parsed locally, no remote API).
- Local PDF/DOCX/text/Markdown files — ONLY when local reads are enabled
  (see Security below).
- Paginating through a long document via `start` / `length`.

Not recommended for:
- Arbitrary HTML web pages -> `fetch` does reader-mode cleanup that this
  tool does not.
- Pages discovered through search -> `fetch` or `research`.

Security (local files are sandboxed and OFF by default):
- Local-file reads are DISABLED unless the server operator sets the
  SEARCH_MCP_DOCUMENT_ROOT env var to a directory. With it unset, a local
  path raises a "local file reads are disabled" error — pass an http(s)
  URL instead, or ask the operator to enable the sandbox.
- When enabled, `source` must resolve INSIDE that root; relative paths
  resolve against the root (not the process CWD) and any `..` traversal
  that escapes the root is rejected. `file://` URLs are always rejected.
- Remote http(s) sources are unaffected by this setting.

Returns:
- markdown (default): rendered document text with a small header.
- json: {content, title, format, total_chars, start, returned_chars,
  truncated}. Use `total_chars` and `returned_chars` to drive pagination.

Common mistakes:
- Calling this on a normal article URL — you'll get raw HTML noise; use
  `fetch` instead.
- Forgetting to advance `start` when paginating: next call should pass
  `start = previous_start + returned_chars`.
- Passing a negative `length` (raises an error) or a `start` past the end
  (clamped to EOF: you'll get `returned_chars == 0`, `start == total_chars`,
  and `truncated == False` — that's the signal you've paged off the end).

Args:
    source: http(s) URL, or a local path UNDER SEARCH_MCP_DOCUMENT_ROOT when
        local reads are enabled (disabled by default — see Security).
    start: Character offset to begin reading from. Default 0. Clamped into
        [0, total_chars]; a negative value is treated as 0.
    length: Max characters to return; None = read to end (still capped by
        the per-call max content size). Must be >= 0 — a negative length
        is rejected with a ValueError.
    format: "markdown" or "json".

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
sourceYes
startNo
lengthNo
formatNomarkdown

Output Schema

TableJSON Schema
NameRequiredDescriptionDefault
resultYes
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Discloses behavioral traits beyond annotations: local parsing, sandboxing, error handling (negative length, start clamping), pagination signals, and output format details. Annotations already declare readOnlyHint and idempotentHint, and the description reinforces and expands on them.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Well-structured with clear sections (Best for, Not recommended, Security, Returns, Common mistakes, Args). Each sentence adds value. Front-loaded with main purpose. No redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given 4 parameters, output schema exists (not shown but described), security concerns, and pagination, the description covers all necessary context: usage, parameters, behavior, return values, and error conditions.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 0% schema description coverage, the description fully explains all 4 parameters: source with URL/path constraints, start with clamping, length with negative rejection, format with enum values. Also includes common mistakes for each parameter.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description explicitly states 'Read an http(s) document (or a sandboxed local file) into Markdown', specifying verb and resource, and distinguishes from siblings like fetch and research by stating when not to use this tool.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides 'Best for' and 'Not recommended for' sections, clearly contrasting with fetch and research, and includes security context about local file availability.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/sweetcornna/free-search-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server