Skip to main content
Glama

extract-pages

Extract specific pages from a PDF file to create a new document containing only selected content.

Instructions

Extract specific pages from a PDF file

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
input_pathYesInput PDF file path
output_pathYesOutput path for new PDF
pagesYesList of page numbers to extract (1-based indexing)

Implementation Reference

  • The main handler logic for the 'extract-pages' tool. It reads the input PDF using PyPDF2.PdfReader, extracts the specified pages (converting from 1-based to 0-based indexing), writes them to a new PDF using PdfWriter, and returns a success or error message.
    elif name == "extract-pages": input_path = arguments.get("input_path") output_path = arguments.get("output_path") pages = arguments.get("pages", []) if not input_path or not output_path or not pages: raise ValueError("Missing required arguments") try: reader = PyPDF2.PdfReader(input_path) writer = PyPDF2.PdfWriter() # Convert 1-based page numbers to 0-based indices for page_num in pages: if 1 <= page_num <= len(reader.pages): writer.add_page(reader.pages[page_num - 1]) else: return [types.TextContent( type="text", text=f"Error: Page number {page_num} is out of range" )] # Write the extracted pages to the output file with open(output_path, 'wb') as output_file: writer.write(output_file) return [types.TextContent( type="text", text=f"Successfully extracted {len(pages)} pages to {output_path}" )] except Exception as e: return [types.TextContent( type="text", text=f"Error extracting pages: {str(e)}" )]
  • The input schema definition for the 'extract-pages' tool, specifying the required parameters: input_path (string), output_path (string), and pages (array of integers). This is returned by the list_tools handler.
    types.Tool( name="extract-pages", description="Extract specific pages from a PDF file", inputSchema={ "type": "object", "properties": { "input_path": { "type": "string", "description": "Input PDF file path" }, "output_path": { "type": "string", "description": "Output path for new PDF" }, "pages": { "type": "array", "items": {"type": "integer"}, "description": "List of page numbers to extract (1-based indexing)" } }, "required": ["input_path", "output_path", "pages"] } ),
  • The tool is registered by being included in the list returned by the handle_list_tools function, decorated with @server.list_tools(). This makes it discoverable by MCP clients.
    @server.list_tools() async def handle_list_tools() -> list[types.Tool]: """List available PDF manipulation tools.""" return [ types.Tool( name="merge-pdfs", description="Merge multiple PDF files into a single PDF", inputSchema={ "type": "object", "properties": { "input_paths": { "type": "array", "items": {"type": "string"}, "description": "List of input PDF file paths" }, "output_path": { "type": "string", "description": "Output path for merged PDF" } }, "required": ["input_paths", "output_path"] } ), types.Tool( name="extract-pages", description="Extract specific pages from a PDF file", inputSchema={ "type": "object", "properties": { "input_path": { "type": "string", "description": "Input PDF file path" }, "output_path": { "type": "string", "description": "Output path for new PDF" }, "pages": { "type": "array", "items": {"type": "integer"}, "description": "List of page numbers to extract (1-based indexing)" } }, "required": ["input_path", "output_path", "pages"] } ), types.Tool( name="search-pdfs", description="Search for PDF files in a directory with optional pattern matching", inputSchema={ "type": "object", "properties": { "base_path": { "type": "string", "description": "Base directory to search in" }, "pattern": { "type": "string", "description": "Pattern to match against filenames (e.g., 'report*.pdf')" }, "recursive": { "type": "boolean", "description": "Whether to search in subdirectories", "default": True } }, "required": ["base_path"] } ), types.Tool( name="merge-pdfs-ordered", description="Merge PDFs in a specific order based on patterns or exact names", inputSchema={ "type": "object", "properties": { "base_path": { "type": "string", "description": "Base directory containing PDFs" }, "patterns": { "type": "array", "items": {"type": "string"}, "description": "List of patterns or names in desired order" }, "output_path": { "type": "string", "description": "Output path for merged PDF" }, "fuzzy_matching": { "type": "boolean", "description": "Use fuzzy matching for filenames", "default": True } }, "required": ["base_path", "patterns", "output_path"] } ), types.Tool( name="find-related-pdfs", description="Find a PDF and then search for related PDFs based on its content, including common substring patterns", inputSchema={ "type": "object", "properties": { "base_path": { "type": "string", "description": "Base directory to search in" }, "target_filename": { "type": "string", "description": "Name of the initial PDF to analyze" }, "pattern_matching_only": { "type": "boolean", "description": "Only search for repeating substring patterns", "default": False }, "min_pattern_occurrences": { "type": "integer", "description": "Minimum times a pattern must appear to be considered significant", "default": 2 } }, "required": ["base_path", "target_filename"] } ) ]

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/hanweg/mcp-pdf-tools'

If you have feedback or need assistance with the MCP directory API, please join our Discord server