Skip to main content
Glama

read_pdf_pages

Extract text content from specific pages of PDF files using local paths or URLs, with built-in caching for efficient document processing.

Instructions

Read content from PDF file for specified page range.

Supports both local file paths and URLs. For URLs, the PDF will be downloaded
to a temporary directory and cached for future use.

Note: Avoid reading too many pages at once (recommended: <50 pages) to prevent errors.

Args:
    pdf_file_path: Path to the PDF file or URL to PDF
    start_page: Starting page number (default: 1)
    end_page: Ending page number (default: 1)
    
Returns:
    Extracted text content from the specified pages

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
pdf_file_pathYes
start_pageNo
end_pageNo

Implementation Reference

  • The `read_pdf_pages` tool implementation in `pdf_tools_mcp/server.py`. It handles input validation, path resolution (supporting local files and URLs), and delegates text extraction to the `extract_text_from_pdf` helper function.
    @mcp.tool()
    async def read_pdf_pages(pdf_file_path: str, start_page: int = 1, end_page: int = 1) -> str:
        """Read content from PDF file for specified page range.
        
        Supports both local file paths and URLs. For URLs, the PDF will be downloaded
        to a temporary directory and cached for future use.
        
        Note: Avoid reading too many pages at once (recommended: <50 pages) to prevent errors.
    
        Args:
            pdf_file_path: Path to the PDF file or URL to PDF
            start_page: Starting page number (default: 1)
            end_page: Ending page number (default: 1)
            
        Returns:
            Extracted text content from the specified pages
        """
        try:
            # Resolve path (download if URL, validate if local path)
            actual_path = resolve_path(pdf_file_path)
            
            # Validate local path if not URL
            if not is_url(pdf_file_path):
                is_valid, error_msg = validate_path(pdf_file_path)
                if not is_valid:
                    return error_msg
        
        except Exception as e:
            return f"Error resolving path: {str(e)}"
        
        # Warning for large page ranges
        if end_page - start_page > 50:
            warning = "Warning: Reading more than 50 pages at once may cause performance issues or errors.\n"
        else:
            warning = ""
        
        try:
            # Read PDF file
            with open(actual_path, 'rb') as file:
                pdf_content = file.read()
            
            # Extract text using the original function
            result = extract_text_from_pdf(pdf_content, start_page, end_page)
            return warning + result if warning else result
            
        except FileNotFoundError:
            return f"Error: File not found '{actual_path}'"
        except PermissionError:
            return f"Error: No permission to read file '{actual_path}'"
        except Exception as e:
            return f"Error reading PDF file: {str(e)}"

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/lockon-n/pdf-tools-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server