Skip to main content
Glama
marc-hanheide

PDF Redaction MCP Server

load_pdf

Load PDF files to prepare them for redaction by extracting text content for review, enabling subsequent redaction operations on sensitive documents.

Instructions

Load a PDF file and make it available for redaction.

This tool loads a PDF file into memory and extracts its text content for review. The PDF remains loaded for subsequent redaction operations.

Args: pdf_path: Path to the PDF file to load ctx: MCP context for logging

Returns: The full text content of the PDF

Raises: ToolError: If the file doesn't exist or cannot be opened

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
pdf_pathYesPath to the PDF file to load

Output Schema

TableJSON Schema
NameRequiredDescriptionDefault
resultYes

Implementation Reference

  • The core handler function for the 'load_pdf' tool. Decorated with @mcp.tool for automatic registration. Loads the PDF using PyMuPDF (fitz), validates file existence, extracts and returns full text content from all pages while storing the document object in a global dictionary for subsequent operations.
    @mcp.tool
    async def load_pdf(
        pdf_path: Annotated[str, Field(description="Path to the PDF file to load")],
        ctx: Context
    ) -> str:
        """Load a PDF file and make it available for redaction.
        
        This tool loads a PDF file into memory and extracts its text content
        for review. The PDF remains loaded for subsequent redaction operations.
        
        Args:
            pdf_path: Path to the PDF file to load
            ctx: MCP context for logging
            
        Returns:
            The full text content of the PDF
            
        Raises:
            ToolError: If the file doesn't exist or cannot be opened
        """
        try:
            path = Path(pdf_path).resolve()
            
            await ctx.info(f"Loading PDF from: {path}")
            
            if not path.exists():
                raise ToolError(f"PDF file not found: {path}")
            
            if not path.is_file():
                raise ToolError(f"Path is not a file: {path}")
                
            # Open the PDF
            doc = fitz.open(str(path))
            
            # Store the document for later use
            _loaded_pdfs[str(path)] = doc
            
            # Initialize redaction tracking for this PDF
            if str(path) not in _applied_redactions:
                _applied_redactions[str(path)] = []
            
            # Extract text from all pages
            text_content = []
            for page_num, page in enumerate(doc, start=1):
                page_text = page.get_text()
                text_content.append(f"--- Page {page_num} ---\n{page_text}")
            
            full_text = "\n\n".join(text_content)
            
            await ctx.info(f"Successfully loaded PDF with {len(doc)} pages")
            
            return full_text
            
        except ToolError:
            raise
        except Exception as e:
            await ctx.error(f"Failed to load PDF: {str(e)}")
            raise ToolError(f"Failed to load PDF: {str(e)}")
  • Pydantic schema definition for the tool input using Annotated and Field, specifying the pdf_path parameter with description.
    pdf_path: Annotated[str, Field(description="Path to the PDF file to load")],
    ctx: Context
  • The @mcp.tool decorator registers the load_pdf function as an MCP tool with FastMCP instance.
    @mcp.tool
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It describes key behaviors: loads PDF into memory, extracts text content, and keeps it loaded for future operations. However, it lacks details on memory implications, performance characteristics, or error handling beyond the basic ToolError mention.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured with clear sections (purpose, args, returns, raises) and uses efficient sentences. However, the parameter explanation in the description slightly duplicates the schema, and the logging context mention could be more concise.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's moderate complexity (loading and extracting text), no annotations, and the presence of an output schema (which handles return values), the description is reasonably complete. It covers purpose, usage context, parameters, returns, and errors, though could benefit from more behavioral details like memory usage or format constraints.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already fully documents the single parameter 'pdf_path'. The description repeats the parameter explanation but does not add meaningful semantics beyond what the schema provides, such as file format requirements or path resolution details.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the specific action ('Load a PDF file') and its purpose ('make it available for redaction'), distinguishing it from sibling tools like 'list_loaded_pdfs' (which only lists) or 'redact_text' (which modifies). It explicitly mentions the resource (PDF file) and the outcome (extracts text content for review).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context for when to use this tool: to load a PDF for subsequent redaction operations. It implies usage by mentioning that the PDF remains loaded for later steps, but does not explicitly state when not to use it or name alternatives like 'list_loaded_pdfs' for checking already loaded files.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/marc-hanheide/redact_mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server