extract_text

Extract text from PDF pages by specifying a file path and optional page range to retrieve content for analysis or processing.

Instructions

Extract text from PDF pages Args: pdf_path: Path to the PDF file start_page: Page number to start extraction (0-indexed). If None, starts from first page. end_page: Page number to end extraction (0-indexed, inclusive). If None, ends at start_page if specified, otherwise extracts all pages. Returns: If extracting a single page: string containing the page text If extracting multiple pages: dictionary mapping page numbers to page text

Input Schema

TableJSON Schema

Name	Required	Description	Default
`pdf_path`	Yes
`start_page`	No
`end_page`	No

Implementation Reference

mcp_pdf_forms/server.py:247-303 (handler)
The handler function for the 'extract_text' MCP tool. It uses PyMuPDF (fitz) to open the PDF and extract text from specified page ranges, returning either a string for single page or a dict of page texts for multiple pages. Registered via @mcp.tool() decorator.
@mcp.tool() def extract_text(pdf_path: str, start_page: Optional[int] = None, end_page: Optional[int] = None) -> Union[str, Dict[int, str]]: """ Extract text from PDF pages Args: pdf_path: Path to the PDF file start_page: Page number to start extraction (0-indexed). If None, starts from first page. end_page: Page number to end extraction (0-indexed, inclusive). If None, ends at start_page if specified, otherwise extracts all pages. Returns: If extracting a single page: string containing the page text If extracting multiple pages: dictionary mapping page numbers to page text """ try: doc = fitz.open(pdf_path) total_pages = len(doc) # Validate page parameters if start_page is not None and (start_page < 0 or start_page >= total_pages): raise ValueError(f"Start page {start_page} is out of range (0-{total_pages-1})") if end_page is not None and (end_page < 0 or end_page >= total_pages): raise ValueError(f"End page {end_page} is out of range (0-{total_pages-1})") # Set defaults if parameters are None if start_page is None: start_page = 0 if end_page is None: if start_page is not None: end_page = start_page else: end_page = total_pages - 1 # Ensure start_page <= end_page if start_page > end_page: start_page, end_page = end_page, start_page # Extract text if start_page == end_page: # Single page extraction page = doc[start_page] text = page.get_text() doc.close() return text else: # Multiple page extraction result = {} for page_num in range(start_page, end_page + 1): page = doc[page_num] result[page_num] = page.get_text() doc.close() return result except Exception as e: raise Exception(f"Error extracting text: {str(e)}")

MCP PDF Forms

extract_text

Instructions

Input Schema

Implementation Reference

Other Tools

Latest Blog Posts

MCP directory API