extract_text
Extract text from specific PDF pages or entire documents. Define start and end pages for targeted extraction, or retrieve all text efficiently. Returns text as strings or page-numbered dictionaries.
Instructions
Extract text from PDF pages
Args:
pdf_path: Path to the PDF file
start_page: Page number to start extraction (0-indexed). If None, starts from first page.
end_page: Page number to end extraction (0-indexed, inclusive). If None, ends at start_page if specified, otherwise extracts all pages.
Returns:
If extracting a single page: string containing the page text
If extracting multiple pages: dictionary mapping page numbers to page text
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| end_page | No | ||
| pdf_path | Yes | ||
| start_page | No |
Implementation Reference
- mcp_pdf_forms/server.py:247-303 (handler)The main handler function for the 'extract_text' tool. It is decorated with @mcp.tool(), which registers it as an MCP tool in FastMCP. The function extracts text from a PDF file for specified page ranges using PyMuPDF (fitz). Returns either a string for single page or a dict of page texts for multiple pages.@mcp.tool() def extract_text(pdf_path: str, start_page: Optional[int] = None, end_page: Optional[int] = None) -> Union[str, Dict[int, str]]: """ Extract text from PDF pages Args: pdf_path: Path to the PDF file start_page: Page number to start extraction (0-indexed). If None, starts from first page. end_page: Page number to end extraction (0-indexed, inclusive). If None, ends at start_page if specified, otherwise extracts all pages. Returns: If extracting a single page: string containing the page text If extracting multiple pages: dictionary mapping page numbers to page text """ try: doc = fitz.open(pdf_path) total_pages = len(doc) # Validate page parameters if start_page is not None and (start_page < 0 or start_page >= total_pages): raise ValueError(f"Start page {start_page} is out of range (0-{total_pages-1})") if end_page is not None and (end_page < 0 or end_page >= total_pages): raise ValueError(f"End page {end_page} is out of range (0-{total_pages-1})") # Set defaults if parameters are None if start_page is None: start_page = 0 if end_page is None: if start_page is not None: end_page = start_page else: end_page = total_pages - 1 # Ensure start_page <= end_page if start_page > end_page: start_page, end_page = end_page, start_page # Extract text if start_page == end_page: # Single page extraction page = doc[start_page] text = page.get_text() doc.close() return text else: # Multiple page extraction result = {} for page_num in range(start_page, end_page + 1): page = doc[page_num] result[page_num] = page.get_text() doc.close() return result except Exception as e: raise Exception(f"Error extracting text: {str(e)}")
- mcp_pdf_forms/server.py:247-247 (registration)The @mcp.tool() decorator registers the extract_text function as an MCP tool.@mcp.tool()