PDF Reader MCP Server

Overview Schema Related Servers Score Discussions

extract_pdf_text

Extract text content from PDF documents with optional metadata and formatting preservation for data processing and analysis.

Instructions

Extract text content from PDF documents with optional metadata and formatting preservation

Input Schema

TableJSON Schema

Name	Required	Description	Default
`file_path`	Yes	Path to the PDF file to extract text from
`pages`	No	Page range to extract (e.g., "1-5", "1,3,5", or "all")	all
`preserve_formatting`	No	Whether to preserve text formatting and structure
`include_metadata`	No	Whether to include document metadata in the response

Implementation Reference

src/tools/extract-text.ts:37-62 (handler)

Core handler function that validates input, extracts text from PDF using PDFProcessor service, handles optional metadata inclusion, and returns structured results or errors.

export async function handleExtractText(args: unknown): Promise<ExtractTextResult> {
  try {
    const params = ExtractTextParamsSchema.parse(args);
    const processor = new PDFProcessor();
    
    const result = await processor.extractText(
      params.file_path,
      params.preserve_formatting
    );

    const response: ExtractTextResult = {
      text: result.text,
      page_count: result.pageCount,
      processing_time_ms: result.processingTimeMs
    };

    if (params.include_metadata) {
      response.metadata = result.metadata;
    }

    return response;
  } catch (error) {
    const mcpError = handleError(error, typeof args === 'object' && args !== null && 'file_path' in args ? String(args.file_path) : undefined);
    throw new Error(JSON.stringify(mcpError));
  }
}

src/tools/extract-text.ts:7-35 (schema)

Tool definition including name, description, and JSON input schema matching the Zod validation schema used in the handler.

export const extractTextTool: Tool = {
  name: 'extract_pdf_text',
  description: 'Extract text content from PDF documents with optional metadata and formatting preservation',
  inputSchema: {
    type: 'object',
    properties: {
      file_path: {
        type: 'string',
        description: 'Path to the PDF file to extract text from'
      },
      pages: {
        type: 'string',
        description: 'Page range to extract (e.g., "1-5", "1,3,5", or "all")',
        default: 'all'
      },
      preserve_formatting: {
        type: 'boolean',
        description: 'Whether to preserve text formatting and structure',
        default: true
      },
      include_metadata: {
        type: 'boolean',
        description: 'Whether to include document metadata in the response',
        default: false
      }
    },
    required: ['file_path']
  }
};

src/types/mcp-types.ts:7-12 (schema)

Zod schema for runtime input validation of extract_pdf_text parameters, parsed in the handler.

export const ExtractTextParamsSchema = z.object({
  file_path: filePathValidation,
  pages: z.string().default('all'),
  preserve_formatting: z.boolean().default(true),
  include_metadata: z.boolean().default(false)
});

src/index.ts:39-46 (registration)

Registers the extract_pdf_text tool (via extractTextTool) in the MCP server's list of available tools.

this.server.setRequestHandler(ListToolsRequestSchema, async () => ({
  tools: [
    extractTextTool,
    extractMetadataTool,
    extractPagesTool,
    validatePDFTool,
  ],
}));

src/index.ts:53-61 (registration)

Switch case in CallToolRequest handler that dispatches to the extract_pdf_text implementation by calling handleExtractText.

case 'extract_pdf_text':
  return {
    content: [
      {
        type: 'text',
        text: JSON.stringify(await handleExtractText(args), null, 2),
      },
    ],
  };

Tool Definition Quality

C2.9/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden of behavioral disclosure. It mentions optional metadata and formatting preservation, but lacks details on permissions needed, file size limits, rate limits, error handling, or output format. For a tool that processes files and has no annotation coverage, this leaves significant behavioral gaps.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence that front-loads the core purpose ('Extract text content from PDF documents') and adds key optional features. Every word earns its place with no redundancy or unnecessary elaboration.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no annotations, no output schema, and a tool that performs file processing with multiple parameters, the description is incomplete. It doesn't address behavioral aspects like error conditions, performance expectations, or output structure, leaving the agent with insufficient context for reliable use.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema fully documents all four parameters. The description adds minimal value beyond the schema by hinting at optional metadata and formatting preservation, but doesn't provide additional context like examples of preserved formatting or metadata types. Baseline 3 is appropriate when the schema does the heavy lifting.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Extract text content from PDF documents' with additional features like optional metadata and formatting preservation. It specifies the verb ('extract') and resource ('text content from PDF documents'), but doesn't explicitly differentiate from sibling tools like 'extract_pdf_metadata' or 'extract_pdf_pages' beyond the core focus on text extraction.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus its siblings (extract_pdf_metadata, extract_pdf_pages, validate_pdf). It mentions optional features like metadata and formatting preservation, but doesn't clarify scenarios where this tool is preferred over alternatives or any prerequisites for use.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

Your AI Chatbot Just Exposed Your CEO's Salary to an Intern
By Om-Shree-0709 on July 2, 2026.
Agent Identity
MCP Security
OAuth Delegation
Why MCP Servers Need Execution Sandboxing (And Why Your Current Stack Isn't Enough)
By Om-Shree-0709 on June 30, 2026.
Agentic Ai
Prompt Injection
WebAssembly
Lightport: Open-Sourcing Glama's AI Gateway
By punkpeye on April 27, 2026.
OpenAI
open source

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/pablontiv/pdf-reader-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server