Skip to main content
Glama

extract-pages

Extract specific pages from a PDF file to create a new document containing only selected content.

Instructions

Extract specific pages from a PDF file

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
input_pathYesInput PDF file path
output_pathYesOutput path for new PDF
pagesYesList of page numbers to extract (1-based indexing)

Implementation Reference

  • The main handler logic for the 'extract-pages' tool. It reads the input PDF using PyPDF2.PdfReader, extracts the specified pages (converting from 1-based to 0-based indexing), writes them to a new PDF using PdfWriter, and returns a success or error message.
    elif name == "extract-pages":
        input_path = arguments.get("input_path")
        output_path = arguments.get("output_path")
        pages = arguments.get("pages", [])
        
        if not input_path or not output_path or not pages:
            raise ValueError("Missing required arguments")
    
        try:
            reader = PyPDF2.PdfReader(input_path)
            writer = PyPDF2.PdfWriter()
    
            # Convert 1-based page numbers to 0-based indices
            for page_num in pages:
                if 1 <= page_num <= len(reader.pages):
                    writer.add_page(reader.pages[page_num - 1])
                else:
                    return [types.TextContent(
                        type="text",
                        text=f"Error: Page number {page_num} is out of range"
                    )]
    
            # Write the extracted pages to the output file
            with open(output_path, 'wb') as output_file:
                writer.write(output_file)
    
            return [types.TextContent(
                type="text",
                text=f"Successfully extracted {len(pages)} pages to {output_path}"
            )]
            
        except Exception as e:
            return [types.TextContent(
                type="text",
                text=f"Error extracting pages: {str(e)}"
            )]
  • The input schema definition for the 'extract-pages' tool, specifying the required parameters: input_path (string), output_path (string), and pages (array of integers). This is returned by the list_tools handler.
    types.Tool(
        name="extract-pages",
        description="Extract specific pages from a PDF file",
        inputSchema={
            "type": "object",
            "properties": {
                "input_path": {
                    "type": "string",
                    "description": "Input PDF file path"
                },
                "output_path": {
                    "type": "string",
                    "description": "Output path for new PDF"
                },
                "pages": {
                    "type": "array",
                    "items": {"type": "integer"},
                    "description": "List of page numbers to extract (1-based indexing)"
                }
            },
            "required": ["input_path", "output_path", "pages"]
        }
    ),
  • The tool is registered by being included in the list returned by the handle_list_tools function, decorated with @server.list_tools(). This makes it discoverable by MCP clients.
    @server.list_tools()
    async def handle_list_tools() -> list[types.Tool]:
        """List available PDF manipulation tools."""
        return [
            types.Tool(
                name="merge-pdfs",
                description="Merge multiple PDF files into a single PDF",
                inputSchema={
                    "type": "object",
                    "properties": {
                        "input_paths": {
                            "type": "array",
                            "items": {"type": "string"},
                            "description": "List of input PDF file paths"
                        },
                        "output_path": {
                            "type": "string",
                            "description": "Output path for merged PDF"
                        }
                    },
                    "required": ["input_paths", "output_path"]
                }
            ),
            types.Tool(
                name="extract-pages",
                description="Extract specific pages from a PDF file",
                inputSchema={
                    "type": "object",
                    "properties": {
                        "input_path": {
                            "type": "string",
                            "description": "Input PDF file path"
                        },
                        "output_path": {
                            "type": "string",
                            "description": "Output path for new PDF"
                        },
                        "pages": {
                            "type": "array",
                            "items": {"type": "integer"},
                            "description": "List of page numbers to extract (1-based indexing)"
                        }
                    },
                    "required": ["input_path", "output_path", "pages"]
                }
            ),
            types.Tool(
                name="search-pdfs",
                description="Search for PDF files in a directory with optional pattern matching",
                inputSchema={
                    "type": "object",
                    "properties": {
                        "base_path": {
                            "type": "string",
                            "description": "Base directory to search in"
                        },
                        "pattern": {
                            "type": "string",
                            "description": "Pattern to match against filenames (e.g., 'report*.pdf')"
                        },
                        "recursive": {
                            "type": "boolean",
                            "description": "Whether to search in subdirectories",
                            "default": True
                        }
                    },
                    "required": ["base_path"]
                }
            ),
            types.Tool(
                name="merge-pdfs-ordered",
                description="Merge PDFs in a specific order based on patterns or exact names",
                inputSchema={
                    "type": "object",
                    "properties": {
                        "base_path": {
                            "type": "string",
                            "description": "Base directory containing PDFs"
                        },
                        "patterns": {
                            "type": "array",
                            "items": {"type": "string"},
                            "description": "List of patterns or names in desired order"
                        },
                        "output_path": {
                            "type": "string",
                            "description": "Output path for merged PDF"
                        },
                        "fuzzy_matching": {
                            "type": "boolean",
                            "description": "Use fuzzy matching for filenames",
                            "default": True
                        }
                    },
                    "required": ["base_path", "patterns", "output_path"]
                }
            ),
            types.Tool(
                name="find-related-pdfs",
                description="Find a PDF and then search for related PDFs based on its content, including common substring patterns",
                inputSchema={
                    "type": "object",
                    "properties": {
                        "base_path": {
                            "type": "string",
                            "description": "Base directory to search in"
                        },
                        "target_filename": {
                            "type": "string",
                            "description": "Name of the initial PDF to analyze"
                        },
                        "pattern_matching_only": {
                            "type": "boolean", 
                            "description": "Only search for repeating substring patterns",
                            "default": False
                        },
                        "min_pattern_occurrences": {
                            "type": "integer",
                            "description": "Minimum times a pattern must appear to be considered significant",
                            "default": 2
                        }
                    },
                    "required": ["base_path", "target_filename"]
                }
            )
        ]

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/hanweg/mcp-pdf-tools'

If you have feedback or need assistance with the MCP directory API, please join our Discord server