extract_pdf_pages
Extract specific pages or page ranges from PDF documents into text or structured formats using the PDF Reader MCP Server. Specify file path and page range to process content efficiently.
Instructions
Extract content from specific pages or page ranges of PDF documents
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| file_path | Yes | Path to the PDF file to extract pages from | |
| output_format | No | Output format: "text" for plain text, "structured" for formatted text | text |
| page_range | Yes | Page range to extract (e.g., "1-3", "2,4,6", or "all") |
Implementation Reference
- src/tools/extract-pages.ts:32-46 (handler)The main execution logic for the 'extract_pdf_pages' tool. Validates input parameters using Zod's ExtractPagesParamsSchema and delegates extraction to the TextExtractor service.export async function handleExtractPages(args: unknown): Promise<ExtractPagesResult> { try { const params = ExtractPagesParamsSchema.parse(args); const extractor = new TextExtractor(); return await extractor.extractFromPages( params.file_path, params.page_range, params.output_format ); } catch (error) { const mcpError = handleError(error, typeof args === 'object' && args !== null && 'file_path' in args ? String(args.file_path) : undefined); throw new Error(JSON.stringify(mcpError)); } }
- src/tools/extract-pages.ts:10-29 (schema)MCP tool input schema defining the parameters for extract_pdf_pages: file_path (required), page_range (required), output_format (optional).inputSchema: { type: 'object', properties: { file_path: { type: 'string', description: 'Path to the PDF file to extract pages from' }, page_range: { type: 'string', description: 'Page range to extract (e.g., "1-3", "2,4,6", or "all")' }, output_format: { type: 'string', enum: ['text', 'structured'], description: 'Output format: "text" for plain text, "structured" for formatted text', default: 'text' } }, required: ['file_path', 'page_range'] }
- src/types/mcp-types.ts:18-22 (schema)Zod validation schema used internally in the handler to parse and validate tool arguments.export const ExtractPagesParamsSchema = z.object({ file_path: filePathValidation, page_range: z.string().min(1, "Page range is required"), output_format: z.enum(["text", "structured"]).default("text") });
- src/index.ts:73-81 (registration)Registration in the tool call dispatcher (switch statement) that invokes the handleExtractPages function.case 'extract_pdf_pages': return { content: [ { type: 'text', text: JSON.stringify(await handleExtractPages(args), null, 2), }, ], };
- src/index.ts:41-45 (registration)Tool registration in the listTools response, including extractPagesTool.extractTextTool, extractMetadataTool, extractPagesTool, validatePDFTool, ],