Skip to main content
Glama
pietermyb

mcp-pdf-reader

get-pdf-page-text

Extract text from a specific page in a PDF document using its ID and page number for targeted content retrieval.

Instructions

Get the text content of a specific page in a PDF

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
pdf_idYesID of the PDF to get page text from
page_numberYesPage number (0-based index)

Implementation Reference

  • Implements the core logic for the 'get-pdf-page-text' tool: validates inputs, retrieves the PDF reader, extracts text from the specified page using PyPDF2, and returns the text content.
    elif name == "get-pdf-page-text":
        pdf_id = arguments.get("pdf_id")
        if not pdf_id or pdf_id not in pdfs:
            raise ValueError("Invalid PDF ID")
    
        page_number = arguments.get("page_number")
        if page_number is None:
            raise ValueError("Missing page number")
    
        reader = pdfs[pdf_id]
    
        # Check if page number is valid
        if page_number < 0 or page_number >= len(reader.pages):
            raise ValueError(f"Invalid page number. PDF has {len(reader.pages)} pages (0-{len(reader.pages)-1})")
    
        # Extract text from the specified page
        page = reader.pages[page_number]
        page_text = page.extract_text()
    
        if not page_text:
            page_text = f"No extractable text found on page {page_number}"
    
        return [
            types.TextContent(
                type="text",
                text=f"Text from page {page_number} of '{os.path.basename(pdf_paths[pdf_id])}':\n\n{page_text}",
            )
        ]
  • Registers the 'get-pdf-page-text' tool with MCP server, providing name, description, and JSON schema for input validation (pdf_id and page_number required).
    types.Tool(
        name="get-pdf-page-text",
        description="Get the text content of a specific page in a PDF",
        inputSchema={
            "type": "object",
            "properties": {
                "pdf_id": {"type": "string", "description": "ID of the PDF to get page text from"},
                "page_number": {"type": "integer", "description": "Page number (0-based index)"},
            },
            "required": ["pdf_id", "page_number"],
        },
    ),

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/pietermyb/mcp-pdf-reader'

If you have feedback or need assistance with the MCP directory API, please join our Discord server