Skip to main content
Glama

get_pdf_images

Extract PDF pages as images to analyze visual elements such as charts, diagrams, tables, and handwritten content that text extraction cannot capture.

Instructions

Extract specific pages or page ranges from a PDF as images for visual analysis. Essential for understanding charts, diagrams, tables, figures, mathematical equations, handwritten content, or any visual elements that text extraction cannot capture. Use when you need to see the actual layout, formatting, or visual content. Supports Python-style slicing: '5' (single page), '5:10' (range), '7:' (from page 7 to end), ':5' (from start to page 5). Returns images as base64-encoded data in MCP image format. Use either absolute_path for any location or relative_path for files in ~/pdf-agent/ directory.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
absolute_pathNoAbsolute path to the PDF file (e.g., '/Users/john/documents/report.pdf')
relative_pathNoPath relative to ~/pdf-agent/ directory (e.g., 'reports/annual.pdf')
use_pdf_homeNoUse PDF agent home directory for relative paths (default: true)
page_rangeNoPage range in enhanced Python-style format: '5' (page 5), '5:10' (pages 5-10), '7:' (page 7 to end), ':5' (start to page 5). Also supports comma-separated combinations: '1,3:5,7' (pages 1, 3-5, and 7), '1-3,7,10:' (pages 1-3, 7, and 10 to end). Default: '1:' (all pages)1:
formatNoImage format: 'jpeg' (smaller file size) or 'png' (higher quality). Default: 'jpeg'jpeg
qualityNoJPEG quality (1-100) - only applies to JPEG format. Higher = better quality but larger size. Default: 85
max_widthNoMaximum image width in pixels (100-3000). Images will be resized proportionally if larger. Optional.
max_heightNoMaximum image height in pixels (100-3000). Images will be resized proportionally if larger. Optional.

Implementation Reference

  • Zod schema for get_pdf_images input validation, defining parameters: absolute_path/relative_path, use_pdf_home, page_range, format (png/jpeg), quality, max_width, max_height.
    const GetPdfImagesSchema = z.object({
      absolute_path: z.string().optional(),
      relative_path: z.string().optional(),
      use_pdf_home: z.boolean().default(true),
      page_range: z.string().default("1:"),
      format: z.enum(["png", "jpeg"]).default("jpeg"),
      quality: z.coerce.number().min(1).max(100).default(85),
      max_width: z.coerce.number().min(100).max(3000).optional(),
      max_height: z.coerce.number().min(100).max(3000).optional(),
    }).refine(
      (data) => (data.absolute_path && !data.relative_path) || (!data.absolute_path && data.relative_path),
      {
        message: "Exactly one of 'absolute_path' or 'relative_path' must be provided",
      }
    );
  • src/index.ts:1522-1573 (registration)
    Tool registration with name 'get_pdf_images', description, and input schema definition within the ListToolsRequestSchema handler.
    {
      name: "get_pdf_images",
      description: "Extract specific pages or page ranges from a PDF as images for visual analysis. Essential for understanding charts, diagrams, tables, figures, mathematical equations, handwritten content, or any visual elements that text extraction cannot capture. Use when you need to see the actual layout, formatting, or visual content. Supports Python-style slicing: '5' (single page), '5:10' (range), '7:' (from page 7 to end), ':5' (from start to page 5). Returns images as base64-encoded data in MCP image format. Use either absolute_path for any location or relative_path for files in ~/pdf-agent/ directory.",
      inputSchema: {
        type: "object",
        properties: {
          absolute_path: {
            type: "string",
            description: "Absolute path to the PDF file (e.g., '/Users/john/documents/report.pdf')",
          },
          relative_path: {
            type: "string",
            description: "Path relative to ~/pdf-agent/ directory (e.g., 'reports/annual.pdf')",
          },
          use_pdf_home: {
            type: "boolean",
            description: "Use PDF agent home directory for relative paths (default: true)",
            default: true,
          },
          page_range: {
            type: "string",
            description: "Page range in enhanced Python-style format: '5' (page 5), '5:10' (pages 5-10), '7:' (page 7 to end), ':5' (start to page 5). Also supports comma-separated combinations: '1,3:5,7' (pages 1, 3-5, and 7), '1-3,7,10:' (pages 1-3, 7, and 10 to end). Default: '1:' (all pages)",
            default: "1:",
          },
          format: {
            type: "string",
            description: "Image format: 'jpeg' (smaller file size) or 'png' (higher quality). Default: 'jpeg'",
            enum: ["jpeg", "png"],
            default: "jpeg",
          },
          quality: {
            type: "number",
            description: "JPEG quality (1-100) - only applies to JPEG format. Higher = better quality but larger size. Default: 85",
            minimum: 1,
            maximum: 100,
            default: 85,
          },
          max_width: {
            type: "number",
            description: "Maximum image width in pixels (100-3000). Images will be resized proportionally if larger. Optional.",
            minimum: 100,
            maximum: 3000,
          },
          max_height: {
            type: "number", 
            description: "Maximum image height in pixels (100-3000). Images will be resized proportionally if larger. Optional.",
            minimum: 100,
            maximum: 3000,
          },
        },
      },
    },
  • Main tool handler for get_pdf_images in the CallToolRequestSchema switch statement. Parses args, resolves file path, reads PDF, parses page range, calls extractPdfImages(), and returns mixed content (text summary + base64 images).
    case "get_pdf_images": {
      const { 
        absolute_path, 
        relative_path, 
        use_pdf_home, 
        page_range, 
        format, 
        quality, 
        max_width, 
        max_height 
      } = GetPdfImagesSchema.parse(args);
      
      try {
        // Resolve the final path based on parameters (same logic as other tools)
        let resolvedPath: string;
        
        if (use_pdf_home && relative_path) {
          // Use relative path from PDF agent home directory
          const pdfAgentHome = await ensurePdfAgentHome();
          resolvedPath = join(pdfAgentHome, relative_path);
        } else if (absolute_path) {
          // Use absolute path directly
          if (!isAbsolute(absolute_path)) {
            return {
              content: [
                {
                  type: "text",
                  text: JSON.stringify({ 
                    error: `Path '${absolute_path}' is not absolute. Use relative_path parameter for relative paths or provide a full absolute path.` 
                  }),
                },
              ],
            };
          }
          resolvedPath = absolute_path;
        } else {
          return {
            content: [
              {
                type: "text",
                text: JSON.stringify({ 
                  error: `Must provide either 'absolute_path' or 'relative_path'. Examples: {"absolute_path": "/Users/john/document.pdf"} or {"relative_path": "reports/annual.pdf"}` 
                }),
              },
            ],
          };
        }
        
        if (!(await fileExists(resolvedPath))) {
          const pathType = relative_path ? 'relative path' : 'absolute path';
          const homeInfo = relative_path ? ` (resolved from ~/pdf-agent/ to ${resolvedPath})` : '';
          return {
            content: [
              {
                type: "text",
                text: JSON.stringify({ 
                  error: `PDF file not found at ${pathType}: ${relative_path || absolute_path}${homeInfo}. Please check the file path and ensure the file exists.` 
                }),
              },
            ],
          };
        }
    
        // Read the PDF file to get total pages
        const pdfBuffer = await safeReadFile(resolvedPath);
        
        // Get PDF document to determine total pages
        let pdfDoc: PDFDocument;
        try {
          pdfDoc = await PDFDocument.load(pdfBuffer);
        } catch (error) {
          if (error instanceof Error && error.message.includes('encrypted')) {
            pdfDoc = await PDFDocument.load(pdfBuffer, { ignoreEncryption: true });
          } else {
            throw error;
          }
        }
        
        const totalPages = pdfDoc.getPageCount();
        
        // Parse page range
        const pageNumbers = parsePageRange(page_range, totalPages);
        
        log('info', `Extracting images from ${pageNumbers.length} pages in ${format} format`, {
          pages: pageNumbers,
          format,
          quality,
          maxDimensions: { maxWidth: max_width, maxHeight: max_height }
        });
        
        // Extract images
        const imageResults = await extractPdfImages(resolvedPath, pageNumbers, {
          format,
          quality,
          maxWidth: max_width,
          maxHeight: max_height
        });
        
        // Prepare MCP response with mixed content (text summary + images)
        const content: any[] = [];
        
        // Add summary as text
        const summary = {
          file_path: resolvedPath,
          total_pages: totalPages,
          extracted_pages: pageNumbers.length,
          page_range: page_range,
          format: format,
          quality: quality,
          max_dimensions: {
            width: max_width || "original",
            height: max_height || "original"
          },
          summary: {
            successful_extractions: imageResults.filter(r => r.image !== null).length,
            failed_extractions: imageResults.filter(r => r.error).length,
            total_size_mb: imageResults
              .filter(r => r.metadata?.processed?.size)
              .reduce((sum, r) => sum + (r.metadata.processed.size / (1024 * 1024)), 0)
              .toFixed(2)
          }
        };
        
        content.push({
          type: "text",
          text: JSON.stringify(summary, null, 2)
        });
        
        // Add each successfully extracted image
        for (const result of imageResults) {
          if (result.image) {
            content.push(result.image);
          } else if (result.error) {
            content.push({
              type: "text",
              text: JSON.stringify({
                page: result.page,
                error: result.error
              })
            });
          }
        }
        
        return {
          content: content
        };
      } catch (e) {
        const providedPath = relative_path || absolute_path || 'unknown';
        const pathType = relative_path ? 'relative path' : 'absolute path';
        return {
          content: [
            {
              type: "text",
              text: JSON.stringify({ 
                error: `Error extracting images from PDF at ${pathType} '${providedPath}': ${e}. Please ensure the file is a valid PDF and check the page range format.` 
              }),
            },
          ],
        };
      }
    }
  • Core helper function extractPdfImages() that converts PDF pages to PNG images using pdf-to-png-converter, then processes each page through imageToBase64() for format conversion and optimization.
    async function extractPdfImages(
      pdfPath: string,
      pageNumbers: number[],
      options: {
        format: 'png' | 'jpeg';
        quality?: number;
        maxWidth?: number;
        maxHeight?: number;
      }
    ): Promise<Array<{ page: number; image: any; metadata: any; error?: string }>> {
      const results: Array<{ page: number; image: any; metadata: any; error?: string }> = [];
      
      try {
        log('info', `Extracting images from PDF pages: ${pageNumbers.join(', ')}`);
        
        // Convert PDF to PNG images (extract all pages first, then filter)
        const pngPages = await pdfToPng(pdfPath, {
          disableFontFace: false,
          useSystemFonts: false,
          viewportScale: 2.0, // High quality scaling
          pagesToProcess: pageNumbers.length <= 10 ? pageNumbers : undefined // Only specify if reasonable count
        });
        
        log('info', `Successfully converted ${pngPages.length} pages to PNG`);
        
        // Process each requested page
        for (const pageNum of pageNumbers) {
          try {
            // Find the corresponding page in the results
            const pageData = pngPages.find(p => p.pageNumber === pageNum);
            
            if (pageData && pageData.content) {
              log('info', `Processing image for page ${pageNum}`);
              
              // Convert to base64 with optimization
              const processed = await imageToBase64(pageData.content, options);
              
              // Check size limit (1MB for token efficiency)
              const sizeInMB = processed.metadata.processed.size / (1024 * 1024);
              if (sizeInMB > 1) {
                log('warn', `Image for page ${pageNum} is ${sizeInMB.toFixed(2)}MB, consider reducing quality or size`);
              }
              
              results.push({
                page: pageNum,
                image: {
                  type: "image",
                  data: processed.data,
                  mimeType: processed.mimeType
                },
                metadata: {
                  ...processed.metadata,
                  originalDimensions: {
                    width: pageData.width,
                    height: pageData.height
                  }
                }
              });
            } else {
              results.push({
                page: pageNum,
                image: null,
                metadata: null,
                error: `Page ${pageNum} not found in PDF conversion results`
              });
            }
          } catch (error) {
            log('warn', `Image processing failed for page ${pageNum}`, { error });
            results.push({
              page: pageNum,
              image: null,
              metadata: null,
              error: `Image processing failed: ${error}`
            });
          }
        }
      } catch (error) {
        log('error', 'PDF to PNG conversion failed', { error });
        throw new Error(`PDF image extraction failed: ${error}`);
      }
      
      return results;
    }
  • Helper function imageToBase64() that converts an image buffer to base64 with options for format (png/jpeg), quality, and max dimensions using the sharp library.
    async function imageToBase64(
      imageBuffer: Buffer, 
      options: {
        format: 'png' | 'jpeg';
        quality?: number;
        maxWidth?: number;
        maxHeight?: number;
      }
    ): Promise<{ data: string; mimeType: string; metadata: any }> {
      try {
        let processor = sharp(imageBuffer);
        
        // Get original metadata
        const originalMetadata = await processor.metadata();
        
        // Apply resizing if specified
        if (options.maxWidth || options.maxHeight) {
          processor = processor.resize(options.maxWidth, options.maxHeight, {
            fit: 'inside',
            withoutEnlargement: true
          });
        }
        
        // Apply format and quality
        if (options.format === 'jpeg') {
          processor = processor.jpeg({ quality: options.quality || 85 });
        } else {
          processor = processor.png({ compressionLevel: 6 });
        }
        
        const processedBuffer = await processor.toBuffer();
        const processedMetadata = await sharp(processedBuffer).metadata();
        
        const base64 = processedBuffer.toString('base64');
        const mimeType = options.format === 'jpeg' ? 'image/jpeg' : 'image/png';
        
        return {
          data: base64,
          mimeType,
          metadata: {
            original: {
              width: originalMetadata.width,
              height: originalMetadata.height,
              size: imageBuffer.length
            },
            processed: {
              width: processedMetadata.width,
              height: processedMetadata.height,
              size: processedBuffer.length,
              format: options.format,
              quality: options.quality
            }
          }
        };
      } catch (error) {
        throw new Error(`Image processing failed: ${error}`);
      }
    }
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description carries full burden. It discloses base64 return format, slicing syntax, and path handling. Does not mention potential side effects or performance but is otherwise transparent.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is somewhat long but front-loaded with purpose. Each sentence adds value, covering purpose, usage, syntax, and return format. Could be slightly more concise but overall well-structured.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no output schema, the description explains return format. It covers all 8 parameters with default values and syntax. Provides sufficient context for the agent to use the tool correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so baseline is 3. The description adds value by explaining the page_range syntax in detail (including comma-separated combinations) and clarifying the path options, which is beyond the schema descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool extracts specific pages or page ranges from a PDF as images, for visual analysis of charts, diagrams, etc. It distinguishes from text extraction tools and explicitly mentions what it captures that text cannot.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context on when to use (for visual content), and implies when not to use (use text extraction for text). It gives explicit slicing syntax and path options. Could be more explicit about alternatives but is sufficient.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/vlad-ds/pdf-agent-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server