Skip to main content
Glama

extract_text

Extract text from specific PDF pages or entire documents. Define start and end pages for targeted extraction, or retrieve all text efficiently. Returns text as strings or page-numbered dictionaries.

Instructions

Extract text from PDF pages Args: pdf_path: Path to the PDF file start_page: Page number to start extraction (0-indexed). If None, starts from first page. end_page: Page number to end extraction (0-indexed, inclusive). If None, ends at start_page if specified, otherwise extracts all pages. Returns: If extracting a single page: string containing the page text If extracting multiple pages: dictionary mapping page numbers to page text

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
end_pageNo
pdf_pathYes
start_pageNo

Implementation Reference

  • The main handler function for the 'extract_text' tool. It is decorated with @mcp.tool(), which registers it as an MCP tool in FastMCP. The function extracts text from a PDF file for specified page ranges using PyMuPDF (fitz). Returns either a string for single page or a dict of page texts for multiple pages.
    @mcp.tool() def extract_text(pdf_path: str, start_page: Optional[int] = None, end_page: Optional[int] = None) -> Union[str, Dict[int, str]]: """ Extract text from PDF pages Args: pdf_path: Path to the PDF file start_page: Page number to start extraction (0-indexed). If None, starts from first page. end_page: Page number to end extraction (0-indexed, inclusive). If None, ends at start_page if specified, otherwise extracts all pages. Returns: If extracting a single page: string containing the page text If extracting multiple pages: dictionary mapping page numbers to page text """ try: doc = fitz.open(pdf_path) total_pages = len(doc) # Validate page parameters if start_page is not None and (start_page < 0 or start_page >= total_pages): raise ValueError(f"Start page {start_page} is out of range (0-{total_pages-1})") if end_page is not None and (end_page < 0 or end_page >= total_pages): raise ValueError(f"End page {end_page} is out of range (0-{total_pages-1})") # Set defaults if parameters are None if start_page is None: start_page = 0 if end_page is None: if start_page is not None: end_page = start_page else: end_page = total_pages - 1 # Ensure start_page <= end_page if start_page > end_page: start_page, end_page = end_page, start_page # Extract text if start_page == end_page: # Single page extraction page = doc[start_page] text = page.get_text() doc.close() return text else: # Multiple page extraction result = {} for page_num in range(start_page, end_page + 1): page = doc[page_num] result[page_num] = page.get_text() doc.close() return result except Exception as e: raise Exception(f"Error extracting text: {str(e)}")
  • The @mcp.tool() decorator registers the extract_text function as an MCP tool.
    @mcp.tool()

Other Tools

Related Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/Wildebeest/mcp_pdf_forms'

If you have feedback or need assistance with the MCP directory API, please join our Discord server