load_pdf
Load PDF files to prepare them for redaction by extracting text content for review, enabling subsequent redaction operations on sensitive documents.
Instructions
Load a PDF file and make it available for redaction.
This tool loads a PDF file into memory and extracts its text content for review. The PDF remains loaded for subsequent redaction operations.
Args: pdf_path: Path to the PDF file to load ctx: MCP context for logging
Returns: The full text content of the PDF
Raises: ToolError: If the file doesn't exist or cannot be opened
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| pdf_path | Yes | Path to the PDF file to load |
Implementation Reference
- src/redact_mcp/server.py:23-80 (handler)The core handler function for the 'load_pdf' tool. Decorated with @mcp.tool for automatic registration. Loads the PDF using PyMuPDF (fitz), validates file existence, extracts and returns full text content from all pages while storing the document object in a global dictionary for subsequent operations.@mcp.tool async def load_pdf( pdf_path: Annotated[str, Field(description="Path to the PDF file to load")], ctx: Context ) -> str: """Load a PDF file and make it available for redaction. This tool loads a PDF file into memory and extracts its text content for review. The PDF remains loaded for subsequent redaction operations. Args: pdf_path: Path to the PDF file to load ctx: MCP context for logging Returns: The full text content of the PDF Raises: ToolError: If the file doesn't exist or cannot be opened """ try: path = Path(pdf_path).resolve() await ctx.info(f"Loading PDF from: {path}") if not path.exists(): raise ToolError(f"PDF file not found: {path}") if not path.is_file(): raise ToolError(f"Path is not a file: {path}") # Open the PDF doc = fitz.open(str(path)) # Store the document for later use _loaded_pdfs[str(path)] = doc # Initialize redaction tracking for this PDF if str(path) not in _applied_redactions: _applied_redactions[str(path)] = [] # Extract text from all pages text_content = [] for page_num, page in enumerate(doc, start=1): page_text = page.get_text() text_content.append(f"--- Page {page_num} ---\n{page_text}") full_text = "\n\n".join(text_content) await ctx.info(f"Successfully loaded PDF with {len(doc)} pages") return full_text except ToolError: raise except Exception as e: await ctx.error(f"Failed to load PDF: {str(e)}") raise ToolError(f"Failed to load PDF: {str(e)}")
- src/redact_mcp/server.py:25-26 (schema)Pydantic schema definition for the tool input using Annotated and Field, specifying the pdf_path parameter with description.pdf_path: Annotated[str, Field(description="Path to the PDF file to load")], ctx: Context
- src/redact_mcp/server.py:23-23 (registration)The @mcp.tool decorator registers the load_pdf function as an MCP tool with FastMCP instance.@mcp.tool