get_pdf_images
Extract PDF pages as images to analyze visual elements such as charts, diagrams, tables, and handwritten content that text extraction cannot capture.
Instructions
Extract specific pages or page ranges from a PDF as images for visual analysis. Essential for understanding charts, diagrams, tables, figures, mathematical equations, handwritten content, or any visual elements that text extraction cannot capture. Use when you need to see the actual layout, formatting, or visual content. Supports Python-style slicing: '5' (single page), '5:10' (range), '7:' (from page 7 to end), ':5' (from start to page 5). Returns images as base64-encoded data in MCP image format. Use either absolute_path for any location or relative_path for files in ~/pdf-agent/ directory.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| absolute_path | No | Absolute path to the PDF file (e.g., '/Users/john/documents/report.pdf') | |
| relative_path | No | Path relative to ~/pdf-agent/ directory (e.g., 'reports/annual.pdf') | |
| use_pdf_home | No | Use PDF agent home directory for relative paths (default: true) | |
| page_range | No | Page range in enhanced Python-style format: '5' (page 5), '5:10' (pages 5-10), '7:' (page 7 to end), ':5' (start to page 5). Also supports comma-separated combinations: '1,3:5,7' (pages 1, 3-5, and 7), '1-3,7,10:' (pages 1-3, 7, and 10 to end). Default: '1:' (all pages) | 1: |
| format | No | Image format: 'jpeg' (smaller file size) or 'png' (higher quality). Default: 'jpeg' | jpeg |
| quality | No | JPEG quality (1-100) - only applies to JPEG format. Higher = better quality but larger size. Default: 85 | |
| max_width | No | Maximum image width in pixels (100-3000). Images will be resized proportionally if larger. Optional. | |
| max_height | No | Maximum image height in pixels (100-3000). Images will be resized proportionally if larger. Optional. |
Implementation Reference
- src/index.ts:127-141 (schema)Zod schema for get_pdf_images input validation, defining parameters: absolute_path/relative_path, use_pdf_home, page_range, format (png/jpeg), quality, max_width, max_height.
const GetPdfImagesSchema = z.object({ absolute_path: z.string().optional(), relative_path: z.string().optional(), use_pdf_home: z.boolean().default(true), page_range: z.string().default("1:"), format: z.enum(["png", "jpeg"]).default("jpeg"), quality: z.coerce.number().min(1).max(100).default(85), max_width: z.coerce.number().min(100).max(3000).optional(), max_height: z.coerce.number().min(100).max(3000).optional(), }).refine( (data) => (data.absolute_path && !data.relative_path) || (!data.absolute_path && data.relative_path), { message: "Exactly one of 'absolute_path' or 'relative_path' must be provided", } ); - src/index.ts:1522-1573 (registration)Tool registration with name 'get_pdf_images', description, and input schema definition within the ListToolsRequestSchema handler.
{ name: "get_pdf_images", description: "Extract specific pages or page ranges from a PDF as images for visual analysis. Essential for understanding charts, diagrams, tables, figures, mathematical equations, handwritten content, or any visual elements that text extraction cannot capture. Use when you need to see the actual layout, formatting, or visual content. Supports Python-style slicing: '5' (single page), '5:10' (range), '7:' (from page 7 to end), ':5' (from start to page 5). Returns images as base64-encoded data in MCP image format. Use either absolute_path for any location or relative_path for files in ~/pdf-agent/ directory.", inputSchema: { type: "object", properties: { absolute_path: { type: "string", description: "Absolute path to the PDF file (e.g., '/Users/john/documents/report.pdf')", }, relative_path: { type: "string", description: "Path relative to ~/pdf-agent/ directory (e.g., 'reports/annual.pdf')", }, use_pdf_home: { type: "boolean", description: "Use PDF agent home directory for relative paths (default: true)", default: true, }, page_range: { type: "string", description: "Page range in enhanced Python-style format: '5' (page 5), '5:10' (pages 5-10), '7:' (page 7 to end), ':5' (start to page 5). Also supports comma-separated combinations: '1,3:5,7' (pages 1, 3-5, and 7), '1-3,7,10:' (pages 1-3, 7, and 10 to end). Default: '1:' (all pages)", default: "1:", }, format: { type: "string", description: "Image format: 'jpeg' (smaller file size) or 'png' (higher quality). Default: 'jpeg'", enum: ["jpeg", "png"], default: "jpeg", }, quality: { type: "number", description: "JPEG quality (1-100) - only applies to JPEG format. Higher = better quality but larger size. Default: 85", minimum: 1, maximum: 100, default: 85, }, max_width: { type: "number", description: "Maximum image width in pixels (100-3000). Images will be resized proportionally if larger. Optional.", minimum: 100, maximum: 3000, }, max_height: { type: "number", description: "Maximum image height in pixels (100-3000). Images will be resized proportionally if larger. Optional.", minimum: 100, maximum: 3000, }, }, }, }, - src/index.ts:2055-2215 (handler)Main tool handler for get_pdf_images in the CallToolRequestSchema switch statement. Parses args, resolves file path, reads PDF, parses page range, calls extractPdfImages(), and returns mixed content (text summary + base64 images).
case "get_pdf_images": { const { absolute_path, relative_path, use_pdf_home, page_range, format, quality, max_width, max_height } = GetPdfImagesSchema.parse(args); try { // Resolve the final path based on parameters (same logic as other tools) let resolvedPath: string; if (use_pdf_home && relative_path) { // Use relative path from PDF agent home directory const pdfAgentHome = await ensurePdfAgentHome(); resolvedPath = join(pdfAgentHome, relative_path); } else if (absolute_path) { // Use absolute path directly if (!isAbsolute(absolute_path)) { return { content: [ { type: "text", text: JSON.stringify({ error: `Path '${absolute_path}' is not absolute. Use relative_path parameter for relative paths or provide a full absolute path.` }), }, ], }; } resolvedPath = absolute_path; } else { return { content: [ { type: "text", text: JSON.stringify({ error: `Must provide either 'absolute_path' or 'relative_path'. Examples: {"absolute_path": "/Users/john/document.pdf"} or {"relative_path": "reports/annual.pdf"}` }), }, ], }; } if (!(await fileExists(resolvedPath))) { const pathType = relative_path ? 'relative path' : 'absolute path'; const homeInfo = relative_path ? ` (resolved from ~/pdf-agent/ to ${resolvedPath})` : ''; return { content: [ { type: "text", text: JSON.stringify({ error: `PDF file not found at ${pathType}: ${relative_path || absolute_path}${homeInfo}. Please check the file path and ensure the file exists.` }), }, ], }; } // Read the PDF file to get total pages const pdfBuffer = await safeReadFile(resolvedPath); // Get PDF document to determine total pages let pdfDoc: PDFDocument; try { pdfDoc = await PDFDocument.load(pdfBuffer); } catch (error) { if (error instanceof Error && error.message.includes('encrypted')) { pdfDoc = await PDFDocument.load(pdfBuffer, { ignoreEncryption: true }); } else { throw error; } } const totalPages = pdfDoc.getPageCount(); // Parse page range const pageNumbers = parsePageRange(page_range, totalPages); log('info', `Extracting images from ${pageNumbers.length} pages in ${format} format`, { pages: pageNumbers, format, quality, maxDimensions: { maxWidth: max_width, maxHeight: max_height } }); // Extract images const imageResults = await extractPdfImages(resolvedPath, pageNumbers, { format, quality, maxWidth: max_width, maxHeight: max_height }); // Prepare MCP response with mixed content (text summary + images) const content: any[] = []; // Add summary as text const summary = { file_path: resolvedPath, total_pages: totalPages, extracted_pages: pageNumbers.length, page_range: page_range, format: format, quality: quality, max_dimensions: { width: max_width || "original", height: max_height || "original" }, summary: { successful_extractions: imageResults.filter(r => r.image !== null).length, failed_extractions: imageResults.filter(r => r.error).length, total_size_mb: imageResults .filter(r => r.metadata?.processed?.size) .reduce((sum, r) => sum + (r.metadata.processed.size / (1024 * 1024)), 0) .toFixed(2) } }; content.push({ type: "text", text: JSON.stringify(summary, null, 2) }); // Add each successfully extracted image for (const result of imageResults) { if (result.image) { content.push(result.image); } else if (result.error) { content.push({ type: "text", text: JSON.stringify({ page: result.page, error: result.error }) }); } } return { content: content }; } catch (e) { const providedPath = relative_path || absolute_path || 'unknown'; const pathType = relative_path ? 'relative path' : 'absolute path'; return { content: [ { type: "text", text: JSON.stringify({ error: `Error extracting images from PDF at ${pathType} '${providedPath}': ${e}. Please ensure the file is a valid PDF and check the page range format.` }), }, ], }; } } - src/index.ts:558-640 (helper)Core helper function extractPdfImages() that converts PDF pages to PNG images using pdf-to-png-converter, then processes each page through imageToBase64() for format conversion and optimization.
async function extractPdfImages( pdfPath: string, pageNumbers: number[], options: { format: 'png' | 'jpeg'; quality?: number; maxWidth?: number; maxHeight?: number; } ): Promise<Array<{ page: number; image: any; metadata: any; error?: string }>> { const results: Array<{ page: number; image: any; metadata: any; error?: string }> = []; try { log('info', `Extracting images from PDF pages: ${pageNumbers.join(', ')}`); // Convert PDF to PNG images (extract all pages first, then filter) const pngPages = await pdfToPng(pdfPath, { disableFontFace: false, useSystemFonts: false, viewportScale: 2.0, // High quality scaling pagesToProcess: pageNumbers.length <= 10 ? pageNumbers : undefined // Only specify if reasonable count }); log('info', `Successfully converted ${pngPages.length} pages to PNG`); // Process each requested page for (const pageNum of pageNumbers) { try { // Find the corresponding page in the results const pageData = pngPages.find(p => p.pageNumber === pageNum); if (pageData && pageData.content) { log('info', `Processing image for page ${pageNum}`); // Convert to base64 with optimization const processed = await imageToBase64(pageData.content, options); // Check size limit (1MB for token efficiency) const sizeInMB = processed.metadata.processed.size / (1024 * 1024); if (sizeInMB > 1) { log('warn', `Image for page ${pageNum} is ${sizeInMB.toFixed(2)}MB, consider reducing quality or size`); } results.push({ page: pageNum, image: { type: "image", data: processed.data, mimeType: processed.mimeType }, metadata: { ...processed.metadata, originalDimensions: { width: pageData.width, height: pageData.height } } }); } else { results.push({ page: pageNum, image: null, metadata: null, error: `Page ${pageNum} not found in PDF conversion results` }); } } catch (error) { log('warn', `Image processing failed for page ${pageNum}`, { error }); results.push({ page: pageNum, image: null, metadata: null, error: `Image processing failed: ${error}` }); } } } catch (error) { log('error', 'PDF to PNG conversion failed', { error }); throw new Error(`PDF image extraction failed: ${error}`); } return results; } - src/index.ts:496-553 (helper)Helper function imageToBase64() that converts an image buffer to base64 with options for format (png/jpeg), quality, and max dimensions using the sharp library.
async function imageToBase64( imageBuffer: Buffer, options: { format: 'png' | 'jpeg'; quality?: number; maxWidth?: number; maxHeight?: number; } ): Promise<{ data: string; mimeType: string; metadata: any }> { try { let processor = sharp(imageBuffer); // Get original metadata const originalMetadata = await processor.metadata(); // Apply resizing if specified if (options.maxWidth || options.maxHeight) { processor = processor.resize(options.maxWidth, options.maxHeight, { fit: 'inside', withoutEnlargement: true }); } // Apply format and quality if (options.format === 'jpeg') { processor = processor.jpeg({ quality: options.quality || 85 }); } else { processor = processor.png({ compressionLevel: 6 }); } const processedBuffer = await processor.toBuffer(); const processedMetadata = await sharp(processedBuffer).metadata(); const base64 = processedBuffer.toString('base64'); const mimeType = options.format === 'jpeg' ? 'image/jpeg' : 'image/png'; return { data: base64, mimeType, metadata: { original: { width: originalMetadata.width, height: originalMetadata.height, size: imageBuffer.length }, processed: { width: processedMetadata.width, height: processedMetadata.height, size: processedBuffer.length, format: options.format, quality: options.quality } } }; } catch (error) { throw new Error(`Image processing failed: ${error}`); } }