Skip to main content
Glama
pablontiv
by pablontiv

extract_pdf_pages

Extract content from specific pages or page ranges of PDF documents to retrieve text or structured data for focused analysis.

Instructions

Extract content from specific pages or page ranges of PDF documents

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
file_pathYesPath to the PDF file to extract pages from
page_rangeYesPage range to extract (e.g., "1-3", "2,4,6", or "all")
output_formatNoOutput format: "text" for plain text, "structured" for formatted texttext

Implementation Reference

  • The core handler function for 'extract_pdf_pages' tool. It validates input using ExtractPagesParamsSchema, instantiates TextExtractor, calls extractFromPages, and handles errors.
    export async function handleExtractPages(args: unknown): Promise<ExtractPagesResult> { try { const params = ExtractPagesParamsSchema.parse(args); const extractor = new TextExtractor(); return await extractor.extractFromPages( params.file_path, params.page_range, params.output_format ); } catch (error) { const mcpError = handleError(error, typeof args === 'object' && args !== null && 'file_path' in args ? String(args.file_path) : undefined); throw new Error(JSON.stringify(mcpError)); } }
  • Tool definition object for 'extract_pdf_pages' including the input schema for validation.
    export const extractPagesTool: Tool = { name: 'extract_pdf_pages', description: 'Extract content from specific pages or page ranges of PDF documents', inputSchema: { type: 'object', properties: { file_path: { type: 'string', description: 'Path to the PDF file to extract pages from' }, page_range: { type: 'string', description: 'Page range to extract (e.g., "1-3", "2,4,6", or "all")' }, output_format: { type: 'string', enum: ['text', 'structured'], description: 'Output format: "text" for plain text, "structured" for formatted text', default: 'text' } }, required: ['file_path', 'page_range'] } };
  • src/index.ts:73-81 (registration)
    Registration in the switch statement that dispatches tool calls to the handleExtractPages function.
    case 'extract_pdf_pages': return { content: [ { type: 'text', text: JSON.stringify(await handleExtractPages(args), null, 2), }, ], };
  • src/index.ts:41-45 (registration)
    Registration of the extractPagesTool in the listTools response.
    extractTextTool, extractMetadataTool, extractPagesTool, validatePDFTool, ],
  • src/index.ts:15-15 (registration)
    Import of the tool definition and handler for 'extract_pdf_pages'.
    import { extractPagesTool, handleExtractPages } from './tools/extract-pages.js';

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/pablontiv/pdf-reader-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server