pdf_extract_images

Instructions

Extract all images from a PDF.

Input Schema

TableJSON Schema

Name	Required	Description	Default
`pdf_path`	Yes
`output_dir`	No

Implementation Reference

pdf_manipulation_mcp_server/pdf_server.py:211-264 (handler)
The core handler function decorated with @mcp.tool(), which both defines the tool logic and registers it with the FastMCP server. It extracts all images from each page of the input PDF using PyMuPDF, saves them as PNG files in a specified or auto-generated output directory, and returns the paths.
@mcp.tool() async def pdf_extract_images( pdf_path: str, output_dir: Optional[str] = None ) -> str: """Extract all images from a PDF.""" if not os.path.exists(pdf_path): return f"Error: PDF file not found: {pdf_path}" if not validate_pdf_file(pdf_path): return f"Error: Invalid PDF file: {pdf_path}" try: # Open PDF document doc = fitz.open(pdf_path) # Determine output directory if not output_dir: pdf_file = Path(pdf_path) output_dir = str(pdf_file.parent / f"{pdf_file.stem}_images") # Create output directory if it doesn't exist os.makedirs(output_dir, exist_ok=True) extracted_images = [] # Extract images from each page for page_num in range(len(doc)): page = doc[page_num] image_list = page.get_images() for img_index, img in enumerate(image_list): # Get image data xref = img[0] pix = fitz.Pixmap(doc, xref) # Skip if image is too small or invalid if pix.n - pix.alpha < 4: # GRAY or RGB img_name = f"page_{page_num + 1}_img_{img_index + 1}.png" img_path = os.path.join(output_dir, img_name) pix.save(img_path) extracted_images.append(img_path) pix = None # Free memory doc.close() if not extracted_images: return "No images found in the PDF." return f"Successfully extracted {len(extracted_images)} images to: {output_dir}\nImages: {', '.join(extracted_images)}" except Exception as e: return f"Error extracting images from PDF: {str(e)}"
pdf_manipulation_mcp_server/pdf_server.py:28-35 (helper)
Helper function used by pdf_extract_images to validate that the input file is a valid PDF before processing.
def validate_pdf_file(pdf_path: str) -> bool: """Validate that the file is a valid PDF.""" try: doc = fitz.open(pdf_path) doc.close() return True except Exception: return False
pdf_manipulation_mcp_server/pdf_server.py:211-264 (registration)
The @mcp.tool() decorator registers the function as an MCP tool with the FastMCP server instance.
@mcp.tool() async def pdf_extract_images( pdf_path: str, output_dir: Optional[str] = None ) -> str: """Extract all images from a PDF.""" if not os.path.exists(pdf_path): return f"Error: PDF file not found: {pdf_path}" if not validate_pdf_file(pdf_path): return f"Error: Invalid PDF file: {pdf_path}" try: # Open PDF document doc = fitz.open(pdf_path) # Determine output directory if not output_dir: pdf_file = Path(pdf_path) output_dir = str(pdf_file.parent / f"{pdf_file.stem}_images") # Create output directory if it doesn't exist os.makedirs(output_dir, exist_ok=True) extracted_images = [] # Extract images from each page for page_num in range(len(doc)): page = doc[page_num] image_list = page.get_images() for img_index, img in enumerate(image_list): # Get image data xref = img[0] pix = fitz.Pixmap(doc, xref) # Skip if image is too small or invalid if pix.n - pix.alpha < 4: # GRAY or RGB img_name = f"page_{page_num + 1}_img_{img_index + 1}.png" img_path = os.path.join(output_dir, img_name) pix.save(img_path) extracted_images.append(img_path) pix = None # Free memory doc.close() if not extracted_images: return "No images found in the PDF." return f"Successfully extracted {len(extracted_images)} images to: {output_dir}\nImages: {', '.join(extracted_images)}" except Exception as e: return f"Error extracting images from PDF: {str(e)}"

PDF Manipulation MCP Server

Instructions

Input Schema

Implementation Reference

Other Tools

Latest Blog Posts

MCP directory API