extract_pdf_pages

Extract content from specific pages or page ranges of PDF documents to retrieve text or structured data for focused analysis.

Instructions

Extract content from specific pages or page ranges of PDF documents

Input Schema

TableJSON Schema

Name	Required	Description	Default
`file_path`	Yes	Path to the PDF file to extract pages from
`page_range`	Yes	Page range to extract (e.g., "1-3", "2,4,6", or "all")
`output_format`	No	Output format: "text" for plain text, "structured" for formatted text	text

Implementation Reference

src/tools/extract-pages.ts:32-46 (handler)
The core handler function for 'extract_pdf_pages' tool. It validates input using ExtractPagesParamsSchema, instantiates TextExtractor, calls extractFromPages, and handles errors.
export async function handleExtractPages(args: unknown): Promise<ExtractPagesResult> { try { const params = ExtractPagesParamsSchema.parse(args); const extractor = new TextExtractor(); return await extractor.extractFromPages( params.file_path, params.page_range, params.output_format ); } catch (error) { const mcpError = handleError(error, typeof args === 'object' && args !== null && 'file_path' in args ? String(args.file_path) : undefined); throw new Error(JSON.stringify(mcpError)); } }
src/tools/extract-pages.ts:7-30 (schema)
Tool definition object for 'extract_pdf_pages' including the input schema for validation.
export const extractPagesTool: Tool = { name: 'extract_pdf_pages', description: 'Extract content from specific pages or page ranges of PDF documents', inputSchema: { type: 'object', properties: { file_path: { type: 'string', description: 'Path to the PDF file to extract pages from' }, page_range: { type: 'string', description: 'Page range to extract (e.g., "1-3", "2,4,6", or "all")' }, output_format: { type: 'string', enum: ['text', 'structured'], description: 'Output format: "text" for plain text, "structured" for formatted text', default: 'text' } }, required: ['file_path', 'page_range'] } };
src/index.ts:73-81 (registration)
Registration in the switch statement that dispatches tool calls to the handleExtractPages function.
case 'extract_pdf_pages': return { content: [ { type: 'text', text: JSON.stringify(await handleExtractPages(args), null, 2), }, ], };
src/index.ts:41-45 (registration)
Registration of the extractPagesTool in the listTools response.
extractTextTool, extractMetadataTool, extractPagesTool, validatePDFTool, ],
src/index.ts:15-15 (registration)
Import of the tool definition and handler for 'extract_pdf_pages'.
import { extractPagesTool, handleExtractPages } from './tools/extract-pages.js';

PDF Reader MCP Server

extract_pdf_pages

Instructions

Input Schema

Implementation Reference

Other Tools

Latest Blog Posts

MCP directory API