Dokmatiq DocGen

Official

by dokmatiq

Overview Schema Related Servers Score Discussions

Python

Remote

extract_text_from_pdf

Extract all text content from a PDF. Input the PDF as a base64-encoded string and receive the extracted text.

Instructions

Extract all text content from a PDF.

Args: pdf_base64: Base64-encoded PDF file.

Returns: Extracted text content.

Input Schema

TableJSON Schema

Name	Required	Description	Default
`pdf_base64`	Yes

Output Schema

TableJSON Schema

Name	Required	Description	Default
`result`	Yes

Tool Definition Quality

A3.6/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Description does not disclose behavioral traits beyond the basic operation; no mention of performance, file size limits, or support for scanned PDFs.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Extremely concise with a clear Args/Returns structure, no wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Sufficient for a simple extraction tool with an output schema; missing some behavioral context but comprehensive enough for basic use.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Adds meaning to the parameter pdf_base64 by specifying it is a Base64-encoded PDF file, which the schema did not provide.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states the tool extracts all text content from a PDF, distinguishing it from sibling tools like merge_pdfs or get_pdf_metadata.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool vs alternatives; lacks context about limitations or prerequisites.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Latest Blog Posts

Who's Calling? MCP Hosts Are an Identity Blind Spot (And the Spec Knows It)
By Om-Shree-0709 on July 25, 2026.
mcp
Agent Identity
OAuth 2.1
Your AI Chatbot Just Exposed Your CEO's Salary to an Intern
By Om-Shree-0709 on July 2, 2026.
Agent Identity
MCP Security
OAuth Delegation
Why MCP Servers Need Execution Sandboxing (And Why Your Current Stack Isn't Enough)
By Om-Shree-0709 on June 30, 2026.
Agentic Ai
Prompt Injection
WebAssembly

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/dokmatiq/docgen-sdks'

If you have feedback or need assistance with the MCP directory API, please join our Discord server