Skip to main content
Glama
marc-hanheide

PDF Redaction MCP Server

load_pdf

Load PDF files to prepare them for redaction by extracting text content for review, enabling subsequent redaction operations on sensitive documents.

Instructions

Load a PDF file and make it available for redaction.

This tool loads a PDF file into memory and extracts its text content for review. The PDF remains loaded for subsequent redaction operations.

Args: pdf_path: Path to the PDF file to load ctx: MCP context for logging

Returns: The full text content of the PDF

Raises: ToolError: If the file doesn't exist or cannot be opened

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
pdf_pathYesPath to the PDF file to load

Implementation Reference

  • The core handler function for the 'load_pdf' tool. Decorated with @mcp.tool for automatic registration. Loads the PDF using PyMuPDF (fitz), validates file existence, extracts and returns full text content from all pages while storing the document object in a global dictionary for subsequent operations.
    @mcp.tool
    async def load_pdf(
        pdf_path: Annotated[str, Field(description="Path to the PDF file to load")],
        ctx: Context
    ) -> str:
        """Load a PDF file and make it available for redaction.
        
        This tool loads a PDF file into memory and extracts its text content
        for review. The PDF remains loaded for subsequent redaction operations.
        
        Args:
            pdf_path: Path to the PDF file to load
            ctx: MCP context for logging
            
        Returns:
            The full text content of the PDF
            
        Raises:
            ToolError: If the file doesn't exist or cannot be opened
        """
        try:
            path = Path(pdf_path).resolve()
            
            await ctx.info(f"Loading PDF from: {path}")
            
            if not path.exists():
                raise ToolError(f"PDF file not found: {path}")
            
            if not path.is_file():
                raise ToolError(f"Path is not a file: {path}")
                
            # Open the PDF
            doc = fitz.open(str(path))
            
            # Store the document for later use
            _loaded_pdfs[str(path)] = doc
            
            # Initialize redaction tracking for this PDF
            if str(path) not in _applied_redactions:
                _applied_redactions[str(path)] = []
            
            # Extract text from all pages
            text_content = []
            for page_num, page in enumerate(doc, start=1):
                page_text = page.get_text()
                text_content.append(f"--- Page {page_num} ---\n{page_text}")
            
            full_text = "\n\n".join(text_content)
            
            await ctx.info(f"Successfully loaded PDF with {len(doc)} pages")
            
            return full_text
            
        except ToolError:
            raise
        except Exception as e:
            await ctx.error(f"Failed to load PDF: {str(e)}")
            raise ToolError(f"Failed to load PDF: {str(e)}")
  • Pydantic schema definition for the tool input using Annotated and Field, specifying the pdf_path parameter with description.
    pdf_path: Annotated[str, Field(description="Path to the PDF file to load")],
    ctx: Context
  • The @mcp.tool decorator registers the load_pdf function as an MCP tool with FastMCP instance.
    @mcp.tool

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/marc-hanheide/redact_mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server