read_pdf_pages
Extract text content from specific pages of PDF files using local paths or URLs, with built-in caching for efficient document processing.
Instructions
Read content from PDF file for specified page range.
Supports both local file paths and URLs. For URLs, the PDF will be downloaded
to a temporary directory and cached for future use.
Note: Avoid reading too many pages at once (recommended: <50 pages) to prevent errors.
Args:
pdf_file_path: Path to the PDF file or URL to PDF
start_page: Starting page number (default: 1)
end_page: Ending page number (default: 1)
Returns:
Extracted text content from the specified pagesInput Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| pdf_file_path | Yes | ||
| start_page | No | ||
| end_page | No |
Implementation Reference
- pdf_tools_mcp/server.py:380-430 (handler)The `read_pdf_pages` tool implementation in `pdf_tools_mcp/server.py`. It handles input validation, path resolution (supporting local files and URLs), and delegates text extraction to the `extract_text_from_pdf` helper function.
@mcp.tool() async def read_pdf_pages(pdf_file_path: str, start_page: int = 1, end_page: int = 1) -> str: """Read content from PDF file for specified page range. Supports both local file paths and URLs. For URLs, the PDF will be downloaded to a temporary directory and cached for future use. Note: Avoid reading too many pages at once (recommended: <50 pages) to prevent errors. Args: pdf_file_path: Path to the PDF file or URL to PDF start_page: Starting page number (default: 1) end_page: Ending page number (default: 1) Returns: Extracted text content from the specified pages """ try: # Resolve path (download if URL, validate if local path) actual_path = resolve_path(pdf_file_path) # Validate local path if not URL if not is_url(pdf_file_path): is_valid, error_msg = validate_path(pdf_file_path) if not is_valid: return error_msg except Exception as e: return f"Error resolving path: {str(e)}" # Warning for large page ranges if end_page - start_page > 50: warning = "Warning: Reading more than 50 pages at once may cause performance issues or errors.\n" else: warning = "" try: # Read PDF file with open(actual_path, 'rb') as file: pdf_content = file.read() # Extract text using the original function result = extract_text_from_pdf(pdf_content, start_page, end_page) return warning + result if warning else result except FileNotFoundError: return f"Error: File not found '{actual_path}'" except PermissionError: return f"Error: No permission to read file '{actual_path}'" except Exception as e: return f"Error reading PDF file: {str(e)}"