Skip to main content
Glama

document-qa-prep

Chunks documents at paragraph boundaries for RAG pipelines. Provides deterministic IDs, token counts, metadata, and overlap support—no LLM required.

Instructions

Prepares a document for question-answering and RAG pipelines. Chunks the input text at paragraph/sentence boundaries, assigns deterministic chunk IDs, estimates token counts, and extracts document metadata (word count, type, headings). Returns ready-to-embed chunks with overlap support. No LLM or external API — pure text processing. Use mid-task when you've fetched a document and need it split before querying a vector store.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
textNoDocument text to prepare (plain text, Markdown, or lightly-structured prose). Max 500,000 chars.
chunk_size_tokensNoTarget chunk size in tokens (default 512, max 4096). Uses 4-char-per-token estimate.
overlap_tokensNoToken overlap between consecutive chunks for context continuity (default 50, max 512).
metadataNoOptional key-value metadata to attach to every chunk (e.g. source URL, document ID).
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description fully discloses key behaviors: chunking boundaries, deterministic IDs, token estimation method, metadata extraction, overlap support, and the fact that it involves no external calls. This exceeds the burden for a non-annotated tool.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences: first delivers core functionality with specific details, second provides workflow context. Every sentence earns its place with no redundancy or filler.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no output schema, the description states 'Returns ready-to-embed chunks' but lacks detail on the exact return structure (e.g., array of chunk objects with fields). For a tool with 4 parameters and straightforward output, this is a minor gap, but overall it covers input, operation, and usage context well.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so the baseline is 3. The description adds context (e.g., chunking at paragraph/sentence boundaries, token estimation at 4 chars per token) but does not significantly enhance per-parameter semantics beyond the schema's own descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: preparing documents for QA and RAG pipelines. It lists specific actions (chunking, ID assignment, token estimation, metadata extraction) and distinguishes itself by emphasizing pure text processing with no LLM or external API, setting it apart from many sibling tools.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit when-to-use guidance: 'Use mid-task when you've fetched a document and need it split before querying a vector store.' It does not list alternatives or when not to use, but the context is clear for an AI agent operating in a pipeline.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/thebrierfox/the-stall'

If you have feedback or need assistance with the MCP directory API, please join our Discord server