Skip to main content
Glama

apaper_read_pdf_file

Extract text content from PDF files, supporting both local documents and online sources with customizable page ranges for academic research.

Instructions

Read and extract text content from a PDF file (local or online)

Args: pdf_source: Path to local PDF file or URL to online PDF start_page: Starting page number (1-indexed, inclusive). Defaults to 1. end_page: Ending page number (1-indexed, inclusive). Defaults to last page.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
pdf_sourceYes
start_pageNo
end_pageNo

Implementation Reference

  • MCP tool handler 'read_pdf_file' (likely namespaced as 'apaper_read_pdf_file') that reads PDF text from local file or URL, converting string page params to int and calling the pdf_reader utility.
    @mcp.tool() def read_pdf_file( pdf_source: str, start_page: int | str | None = None, end_page: int | str | None = None, ) -> str: """ Read and extract text content from a PDF file (local or online) Args: pdf_source: Path to local PDF file or URL to online PDF start_page: Starting page number (1-indexed, inclusive). Defaults to 1. end_page: Ending page number (1-indexed, inclusive). Defaults to last page. """ try: # Convert string parameters to integers if needed start_page_int = None end_page_int = None if start_page is not None: start_page_int = int(start_page) if end_page is not None: end_page_int = int(end_page) result = read_pdf(pdf_source, start_page=start_page_int, end_page=end_page_int) return result except ValueError as e: return f"Error: Invalid page number format. Please provide valid integers for start_page and end_page." except Exception as e: return f"Error reading PDF from {pdf_source}: {str(e)}"
  • Core utility function 'read_pdf' that handles PDF text extraction for both local files and URLs, dispatching to private helpers and normalizing page ranges. Called by the MCP handler.
    def read_pdf(pdf_source: str | Path, start_page: int | None = None, end_page: int | None = None) -> str: """ Extract text content from a PDF file (local or online). Args: pdf_source: Path to local PDF file or URL to online PDF start_page: Starting page number (1-indexed, inclusive). Defaults to 1. end_page: Ending page number (1-indexed, inclusive). Defaults to last page. Returns: str: Extracted text content from the PDF Raises: FileNotFoundError: If local file doesn't exist ValueError: If URL is invalid, PDF cannot be processed, or page range is invalid Exception: For other PDF processing errors """ try: if isinstance(pdf_source, str | Path): pdf_source_str = str(pdf_source) # Check if it's a URL parsed = urlparse(pdf_source_str) if parsed.scheme in ("http", "https"): # Handle online PDF return _read_pdf_from_url(pdf_source_str, start_page, end_page) else: # Handle local file return _read_pdf_from_file(Path(pdf_source_str), start_page, end_page) else: raise ValueError("pdf_source must be a string or Path object") except Exception as e: raise Exception(f"Failed to read PDF from {pdf_source}: {e!s}") from e
  • FastMCP server initialization with namespace 'apaper', which likely prefixes tool names (e.g., 'apaper_read_pdf_file'). All @mcp.tool() decorators register tools here.
    mcp = FastMCP("apaper")
  • Helper function to normalize and validate PDF page range inputs (1-indexed to 0-indexed). Used by read_pdf.
    def _normalize_page_range(start_page: int | None, end_page: int | None, total_pages: int) -> tuple[int, int]: """ Normalize and validate page range parameters. Args: start_page: Starting page number (1-indexed, inclusive) or None end_page: Ending page number (1-indexed, inclusive) or None total_pages: Total number of pages in the PDF Returns: tuple[int, int]: (start_index, end_index) as 0-indexed values Raises: ValueError: If page range is invalid """ # Default values if start_page is None: start_page = 1 if end_page is None: end_page = total_pages # Validate page numbers if start_page < 1: raise ValueError(f"start_page must be >= 1, got {start_page}") if end_page < 1: raise ValueError(f"end_page must be >= 1, got {end_page}") if start_page > end_page: raise ValueError(f"start_page ({start_page}) must be <= end_page ({end_page})") if start_page > total_pages: raise ValueError(f"start_page ({start_page}) exceeds total pages ({total_pages})") # Clamp end_page to total_pages if end_page > total_pages: end_page = total_pages # Convert to 0-indexed return start_page - 1, end_page - 1

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/jiahaoxiang2000/all-in-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server