Skip to main content
Glama

raw_read

Read raw source documents with automatic text extraction for PDF, DOCX, XLSX, PPTX, and plain text. Paginate by pages, sheet, or line offset.

Instructions

Read a raw source document's content and metadata. Raw files are immutable — this is read-only. Text/SVG files return content as string; document files (PDF, DOCX, XLSX, PPTX) have text extracted automatically; other binary files (images, etc.) return metadata only.

Pagination by format:

  • PDF: use 'pages' for page ranges (e.g. '1-5')

  • PPTX: use 'pages' for slide ranges (e.g. '1-10')

  • XLSX: use 'sheet' to read a specific sheet; response always includes 'sheet_names'

  • DOCX / text: use 'offset' + 'limit' for line-based pagination (default limit: 200)

For large documents, paginate rather than reading all at once.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
filenameYesFilename relative to raw/ (e.g. 'article-yolo.md')
pagesNoPage/slide range (e.g. '1-5', '3', '1-3,7-10'). Applies to PDF and PPTX. Omit to read all.
sheetNoSheet name for XLSX files. Omit to read all sheets (response always includes sheet_names list).
offsetNoLine offset for paginating text/DOCX files. Default: 0.
limitNoMax lines to return for text/DOCX pagination. Default: 200, max: 500.
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description fully discloses behavior: raw files are immutable and read-only. It explains return formats per file type (content string for text/SVG, extracted text for documents, metadata only for binary) and pagination nuances (e.g., XLSX always returns sheet_names). This is highly transparent.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured: a concise opening sentence, followed by a clear section on pagination by format, and a final recommendation. Every sentence adds value, and there is no redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no output schema, the description adequately explains what is returned for each file type and how pagination works. It covers all common formats. Missing details on error handling (e.g., missing file) or metadata structure for binary files, but overall it is sufficiently complete for a read tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, but the description adds significant value by linking parameters to specific formats (e.g., 'pages' for PDF/PPTX, 'sheet' for XLSX) and explaining defaults (limit=200) and behavior when omitted. It provides contextual usage beyond schema descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description starts with 'Read a raw source document's content and metadata', providing a clear verb and resource. It distinguishes from sibling tools like raw_ingest, raw_list, and raw_versions by specifying the read operation.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description gives detailed pagination guidance per file format (PDF, PPTX, XLSX, DOCX/text), including how to use parameters like 'pages', 'sheet', 'offset', and 'limit'. It also advises pagination for large documents. However, it does not explicitly compare to sibling tools (e.g., when to use raw_read vs wiki_read), but the resource type difference is clear.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/xinhuagu/agent-wiki'

If you have feedback or need assistance with the MCP directory API, please join our Discord server