extract_pdf_pages
Extract content from specific pages or page ranges of PDF documents to retrieve text or structured data for focused analysis.
Instructions
Extract content from specific pages or page ranges of PDF documents
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| file_path | Yes | Path to the PDF file to extract pages from | |
| page_range | Yes | Page range to extract (e.g., "1-3", "2,4,6", or "all") | |
| output_format | No | Output format: "text" for plain text, "structured" for formatted text | text |
Implementation Reference
- src/tools/extract-pages.ts:32-46 (handler)The core handler function for 'extract_pdf_pages' tool. It validates input using ExtractPagesParamsSchema, instantiates TextExtractor, calls extractFromPages, and handles errors.export async function handleExtractPages(args: unknown): Promise<ExtractPagesResult> { try { const params = ExtractPagesParamsSchema.parse(args); const extractor = new TextExtractor(); return await extractor.extractFromPages( params.file_path, params.page_range, params.output_format ); } catch (error) { const mcpError = handleError(error, typeof args === 'object' && args !== null && 'file_path' in args ? String(args.file_path) : undefined); throw new Error(JSON.stringify(mcpError)); } }
- src/tools/extract-pages.ts:7-30 (schema)Tool definition object for 'extract_pdf_pages' including the input schema for validation.export const extractPagesTool: Tool = { name: 'extract_pdf_pages', description: 'Extract content from specific pages or page ranges of PDF documents', inputSchema: { type: 'object', properties: { file_path: { type: 'string', description: 'Path to the PDF file to extract pages from' }, page_range: { type: 'string', description: 'Page range to extract (e.g., "1-3", "2,4,6", or "all")' }, output_format: { type: 'string', enum: ['text', 'structured'], description: 'Output format: "text" for plain text, "structured" for formatted text', default: 'text' } }, required: ['file_path', 'page_range'] } };
- src/index.ts:73-81 (registration)Registration in the switch statement that dispatches tool calls to the handleExtractPages function.case 'extract_pdf_pages': return { content: [ { type: 'text', text: JSON.stringify(await handleExtractPages(args), null, 2), }, ], };
- src/index.ts:41-45 (registration)Registration of the extractPagesTool in the listTools response.extractTextTool, extractMetadataTool, extractPagesTool, validatePDFTool, ],
- src/index.ts:15-15 (registration)Import of the tool definition and handler for 'extract_pdf_pages'.import { extractPagesTool, handleExtractPages } from './tools/extract-pages.js';