get-pdf-page-text
Extract text from a specific page in a PDF document using its ID and page number for targeted content retrieval.
Instructions
Get the text content of a specific page in a PDF
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| pdf_id | Yes | ID of the PDF to get page text from | |
| page_number | Yes | Page number (0-based index) |
Implementation Reference
- src/pdf_reader_mcp/server.py:521-548 (handler)Implements the core logic for the 'get-pdf-page-text' tool: validates inputs, retrieves the PDF reader, extracts text from the specified page using PyPDF2, and returns the text content.elif name == "get-pdf-page-text": pdf_id = arguments.get("pdf_id") if not pdf_id or pdf_id not in pdfs: raise ValueError("Invalid PDF ID") page_number = arguments.get("page_number") if page_number is None: raise ValueError("Missing page number") reader = pdfs[pdf_id] # Check if page number is valid if page_number < 0 or page_number >= len(reader.pages): raise ValueError(f"Invalid page number. PDF has {len(reader.pages)} pages (0-{len(reader.pages)-1})") # Extract text from the specified page page = reader.pages[page_number] page_text = page.extract_text() if not page_text: page_text = f"No extractable text found on page {page_number}" return [ types.TextContent( type="text", text=f"Text from page {page_number} of '{os.path.basename(pdf_paths[pdf_id])}':\n\n{page_text}", ) ]
- src/pdf_reader_mcp/server.py:395-406 (registration)Registers the 'get-pdf-page-text' tool with MCP server, providing name, description, and JSON schema for input validation (pdf_id and page_number required).types.Tool( name="get-pdf-page-text", description="Get the text content of a specific page in a PDF", inputSchema={ "type": "object", "properties": { "pdf_id": {"type": "string", "description": "ID of the PDF to get page text from"}, "page_number": {"type": "integer", "description": "Page number (0-based index)"}, }, "required": ["pdf_id", "page_number"], }, ),