Skip to main content
Glama
andr3medeiros

PDF Manipulation MCP Server

pdf_extract_images

Extract all images from PDF files for reuse in other documents or projects. Specify the PDF path and optional output directory to save images.

Instructions

Extract all images from a PDF.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
pdf_pathYes
output_dirNo

Implementation Reference

  • The core handler function decorated with @mcp.tool(), which both defines the tool logic and registers it with the FastMCP server. It extracts all images from each page of the input PDF using PyMuPDF, saves them as PNG files in a specified or auto-generated output directory, and returns the paths.
    @mcp.tool() async def pdf_extract_images( pdf_path: str, output_dir: Optional[str] = None ) -> str: """Extract all images from a PDF.""" if not os.path.exists(pdf_path): return f"Error: PDF file not found: {pdf_path}" if not validate_pdf_file(pdf_path): return f"Error: Invalid PDF file: {pdf_path}" try: # Open PDF document doc = fitz.open(pdf_path) # Determine output directory if not output_dir: pdf_file = Path(pdf_path) output_dir = str(pdf_file.parent / f"{pdf_file.stem}_images") # Create output directory if it doesn't exist os.makedirs(output_dir, exist_ok=True) extracted_images = [] # Extract images from each page for page_num in range(len(doc)): page = doc[page_num] image_list = page.get_images() for img_index, img in enumerate(image_list): # Get image data xref = img[0] pix = fitz.Pixmap(doc, xref) # Skip if image is too small or invalid if pix.n - pix.alpha < 4: # GRAY or RGB img_name = f"page_{page_num + 1}_img_{img_index + 1}.png" img_path = os.path.join(output_dir, img_name) pix.save(img_path) extracted_images.append(img_path) pix = None # Free memory doc.close() if not extracted_images: return "No images found in the PDF." return f"Successfully extracted {len(extracted_images)} images to: {output_dir}\nImages: {', '.join(extracted_images)}" except Exception as e: return f"Error extracting images from PDF: {str(e)}"
  • Helper function used by pdf_extract_images to validate that the input file is a valid PDF before processing.
    def validate_pdf_file(pdf_path: str) -> bool: """Validate that the file is a valid PDF.""" try: doc = fitz.open(pdf_path) doc.close() return True except Exception: return False
  • The @mcp.tool() decorator registers the function as an MCP tool with the FastMCP server instance.
    @mcp.tool() async def pdf_extract_images( pdf_path: str, output_dir: Optional[str] = None ) -> str: """Extract all images from a PDF.""" if not os.path.exists(pdf_path): return f"Error: PDF file not found: {pdf_path}" if not validate_pdf_file(pdf_path): return f"Error: Invalid PDF file: {pdf_path}" try: # Open PDF document doc = fitz.open(pdf_path) # Determine output directory if not output_dir: pdf_file = Path(pdf_path) output_dir = str(pdf_file.parent / f"{pdf_file.stem}_images") # Create output directory if it doesn't exist os.makedirs(output_dir, exist_ok=True) extracted_images = [] # Extract images from each page for page_num in range(len(doc)): page = doc[page_num] image_list = page.get_images() for img_index, img in enumerate(image_list): # Get image data xref = img[0] pix = fitz.Pixmap(doc, xref) # Skip if image is too small or invalid if pix.n - pix.alpha < 4: # GRAY or RGB img_name = f"page_{page_num + 1}_img_{img_index + 1}.png" img_path = os.path.join(output_dir, img_name) pix.save(img_path) extracted_images.append(img_path) pix = None # Free memory doc.close() if not extracted_images: return "No images found in the PDF." return f"Successfully extracted {len(extracted_images)} images to: {output_dir}\nImages: {', '.join(extracted_images)}" except Exception as e: return f"Error extracting images from PDF: {str(e)}"

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/andr3medeiros/pdf-manipulation-mcp-server'

If you have feedback or need assistance with the MCP directory API, please join our Discord server