Skip to main content
Glama

pdf_extract_images

Extract all images from PDF documents to save them as individual image files for reuse in other applications or projects.

Instructions

Extract all images from a PDF.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
pdf_pathYes
output_dirNo

Implementation Reference

  • The primary handler function for the 'pdf_extract_images' tool. Decorated with @mcp.tool() for automatic registration in FastMCP. Extracts images from all pages of the input PDF using PyMuPDF, saves them as PNG files in the specified or auto-generated directory, and returns success message with image paths.
    @mcp.tool() async def pdf_extract_images( pdf_path: str, output_dir: Optional[str] = None ) -> str: """Extract all images from a PDF.""" if not os.path.exists(pdf_path): return f"Error: PDF file not found: {pdf_path}" if not validate_pdf_file(pdf_path): return f"Error: Invalid PDF file: {pdf_path}" try: # Open PDF document doc = fitz.open(pdf_path) # Determine output directory if not output_dir: pdf_file = Path(pdf_path) output_dir = str(pdf_file.parent / f"{pdf_file.stem}_images") # Create output directory if it doesn't exist os.makedirs(output_dir, exist_ok=True) extracted_images = [] # Extract images from each page for page_num in range(len(doc)): page = doc[page_num] image_list = page.get_images() for img_index, img in enumerate(image_list): # Get image data xref = img[0] pix = fitz.Pixmap(doc, xref) # Skip if image is too small or invalid if pix.n - pix.alpha < 4: # GRAY or RGB img_name = f"page_{page_num + 1}_img_{img_index + 1}.png" img_path = os.path.join(output_dir, img_name) pix.save(img_path) extracted_images.append(img_path) pix = None # Free memory doc.close() if not extracted_images: return "No images found in the PDF." return f"Successfully extracted {len(extracted_images)} images to: {output_dir}\nImages: {', '.join(extracted_images)}" except Exception as e: return f"Error extracting images from PDF: {str(e)}"
  • Supporting utility function called by the pdf_extract_images handler (and others) to validate that the input file is a valid PDF before processing.
    def validate_pdf_file(pdf_path: str) -> bool: """Validate that the file is a valid PDF.""" try: doc = fitz.open(pdf_path) doc.close() return True except Exception: return False
  • The @mcp.tool() decorator registers the pdf_extract_images function as an MCP tool with FastMCP instance 'mcp'.
    @mcp.tool()

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/andr3medeiros/pdf-manipulation-mcp-server'

If you have feedback or need assistance with the MCP directory API, please join our Discord server