Skip to main content
Glama

apaper_read_pdf_file

Extract text content from PDF files, supporting both local documents and online sources with customizable page ranges for academic research.

Instructions

Read and extract text content from a PDF file (local or online)

Args: pdf_source: Path to local PDF file or URL to online PDF start_page: Starting page number (1-indexed, inclusive). Defaults to 1. end_page: Ending page number (1-indexed, inclusive). Defaults to last page.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
pdf_sourceYes
start_pageNo
end_pageNo

Implementation Reference

  • MCP tool handler 'read_pdf_file' (likely namespaced as 'apaper_read_pdf_file') that reads PDF text from local file or URL, converting string page params to int and calling the pdf_reader utility.
    @mcp.tool()
    def read_pdf_file(
        pdf_source: str,
        start_page: int | str | None = None,
        end_page: int | str | None = None,
    ) -> str:
        """
        Read and extract text content from a PDF file (local or online)
        
        Args:
            pdf_source: Path to local PDF file or URL to online PDF
            start_page: Starting page number (1-indexed, inclusive). Defaults to 1.
            end_page: Ending page number (1-indexed, inclusive). Defaults to last page.
        """
        try:
            # Convert string parameters to integers if needed
            start_page_int = None
            end_page_int = None
            
            if start_page is not None:
                start_page_int = int(start_page)
            
            if end_page is not None:
                end_page_int = int(end_page)
            
            result = read_pdf(pdf_source, start_page=start_page_int, end_page=end_page_int)
            return result
        except ValueError as e:
            return f"Error: Invalid page number format. Please provide valid integers for start_page and end_page."
        except Exception as e:
            return f"Error reading PDF from {pdf_source}: {str(e)}"
  • Core utility function 'read_pdf' that handles PDF text extraction for both local files and URLs, dispatching to private helpers and normalizing page ranges. Called by the MCP handler.
    def read_pdf(pdf_source: str | Path, start_page: int | None = None, end_page: int | None = None) -> str:
        """
        Extract text content from a PDF file (local or online).
    
        Args:
            pdf_source: Path to local PDF file or URL to online PDF
            start_page: Starting page number (1-indexed, inclusive). Defaults to 1.
            end_page: Ending page number (1-indexed, inclusive). Defaults to last page.
    
        Returns:
            str: Extracted text content from the PDF
    
        Raises:
            FileNotFoundError: If local file doesn't exist
            ValueError: If URL is invalid, PDF cannot be processed, or page range is invalid
            Exception: For other PDF processing errors
        """
        try:
            if isinstance(pdf_source, str | Path):
                pdf_source_str = str(pdf_source)
    
                # Check if it's a URL
                parsed = urlparse(pdf_source_str)
                if parsed.scheme in ("http", "https"):
                    # Handle online PDF
                    return _read_pdf_from_url(pdf_source_str, start_page, end_page)
                else:
                    # Handle local file
                    return _read_pdf_from_file(Path(pdf_source_str), start_page, end_page)
            else:
                raise ValueError("pdf_source must be a string or Path object")
    
        except Exception as e:
            raise Exception(f"Failed to read PDF from {pdf_source}: {e!s}") from e
  • FastMCP server initialization with namespace 'apaper', which likely prefixes tool names (e.g., 'apaper_read_pdf_file'). All @mcp.tool() decorators register tools here.
    mcp = FastMCP("apaper")
  • Helper function to normalize and validate PDF page range inputs (1-indexed to 0-indexed). Used by read_pdf.
    def _normalize_page_range(start_page: int | None, end_page: int | None, total_pages: int) -> tuple[int, int]:
        """
        Normalize and validate page range parameters.
        
        Args:
            start_page: Starting page number (1-indexed, inclusive) or None
            end_page: Ending page number (1-indexed, inclusive) or None
            total_pages: Total number of pages in the PDF
            
        Returns:
            tuple[int, int]: (start_index, end_index) as 0-indexed values
            
        Raises:
            ValueError: If page range is invalid
        """
        # Default values
        if start_page is None:
            start_page = 1
        if end_page is None:
            end_page = total_pages
            
        # Validate page numbers
        if start_page < 1:
            raise ValueError(f"start_page must be >= 1, got {start_page}")
        if end_page < 1:
            raise ValueError(f"end_page must be >= 1, got {end_page}")
        if start_page > end_page:
            raise ValueError(f"start_page ({start_page}) must be <= end_page ({end_page})")
        if start_page > total_pages:
            raise ValueError(f"start_page ({start_page}) exceeds total pages ({total_pages})")
            
        # Clamp end_page to total_pages
        if end_page > total_pages:
            end_page = total_pages
            
        # Convert to 0-indexed
        return start_page - 1, end_page - 1

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/isomoes/all-in-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server