PDF Redaction MCP Server
A Model Context Protocol (MCP) server for PDF redaction using PyMuPDF (fitz). This server provides tools for loading PDFs, identifying and redacting sensitive text, and saving redacted documents.
Features
š Load and read PDF files - Extract text content from PDFs for review
š Batch text redaction - Search and redact multiple text strings at once for maximum efficiency
š Redaction tracking - Keep track of what's been redacted to prevent duplicate work
š List applied redactions - Audit trail showing which texts have been marked for redaction
š Area-based redaction - Redact specific rectangular regions by coordinates
š¾ Save redacted PDFs - Apply redactions and save with automatic naming
šØ Customizable redaction appearance - Choose redaction fill colors
š Error handling - Comprehensive error messages via MCP protocol
Installation
This project uses uv for package management. To install:
Usage
Running the Server
You can run the server using either the Python script directly or the FastMCP CLI:
Option 1: Direct Python execution (stdio transport)
Option 2: Using FastMCP CLI
Installing in MCP Clients
Claude Desktop
Add to your Claude Desktop configuration file:
macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
Windows: %APPDATA%\Claude\claude_desktop_config.json
Other MCP Clients
Use the FastMCP CLI to generate configuration for other clients:
Available Tools
1. load_pdf
Load a PDF file and extract its text content.
Parameters:
pdf_path(string): Path to the PDF file to load
Returns: The full text content of the PDF, organized by pages
Example:
2. redact_text
Redact all instances of specific texts in a loaded PDF. This tool now accepts multiple texts at once for efficient batch redaction. It automatically tracks which texts have already been redacted to prevent duplicate work.
Parameters:
pdf_path(string): Path to the loaded PDF filetexts_to_redact(list of strings): List of text strings to search for and redactfill_color(tuple, optional): RGB color (0-1 range) for redaction box. Default: (0, 0, 0) - black
Returns: Summary of redaction operations, including which texts were newly redacted and which were skipped (already redacted)
Examples:
Note: The tool tracks which texts have been redacted and will skip any texts that were already processed, preventing duplicate redactions.
3. redact_area
Redact a specific rectangular area on a PDF page.
Parameters:
pdf_path(string): Path to the loaded PDF filepage_number(int): Page number (1-indexed)x0(float): Left x coordinatey0(float): Top y coordinatex1(float): Right x coordinatey1(float): Bottom y coordinatefill_color(tuple, optional): RGB color (0-1 range) for redaction box. Default: (0, 0, 0) - black
Returns: Confirmation message
Example:
4. save_redacted_pdf
Apply all pending redactions and save the PDF.
Parameters:
pdf_path(string): Path to the loaded PDF fileoutput_path(string, optional): Custom output path. If not provided, appends "_redacted" to original filename
Returns: Path to the saved redacted PDF
Example:
5. list_loaded_pdfs
List all currently loaded PDF files.
Parameters: None
Returns: List of loaded PDF paths with page counts
6. list_applied_redactions
List all redactions that have been applied to loaded PDF(s). New tool for tracking redaction progress and avoiding duplicate work.
Parameters:
pdf_path(string, optional): Path to a specific PDF. If not provided, lists redactions for all loaded PDFs
Returns: List of texts that have been marked for redaction in each PDF
Examples:
Use Cases:
Check what has already been redacted before adding more redactions
Verify redaction progress during a multi-step process
Avoid duplicate redaction attempts
Generate a report of what was redacted
7. close_pdf
Close a loaded PDF and free its resources. This also clears the redaction tracking for that PDF.
Parameters:
pdf_path(string): Path to the PDF file to close
Returns: Confirmation message
Workflow Example
Here's a typical workflow using this MCP server:
Load a PDF
Load the PDF at /Users/me/documents/sensitive.pdfReview the content The tool will return the full text content, which you can review to identify sensitive information.
Redact sensitive text (batch mode - recommended)
Redact ["Social Security Number", "123-45-6789", "John Doe", "jane.smith@email.com"] in /Users/me/documents/sensitive.pdfPro tip: Redacting multiple texts at once is much faster than calling the tool multiple times.
Check what has been redacted (optional)
List applied redactions for /Users/me/documents/sensitive.pdfThis shows you which texts have already been marked for redaction.
Add more redactions if needed
Redact ["Additional Text", "Another Secret"] in /Users/me/documents/sensitive.pdfThe tool will skip any texts that were already redacted in step 3.
Redact specific areas (optional)
Redact the area from (50, 100) to (200, 120) on page 2 of /Users/me/documents/sensitive.pdfSave the redacted PDF
Save the redacted version of /Users/me/documents/sensitive.pdfThis will create
/Users/me/documents/sensitive_redacted.pdfClose the PDF (optional)
Close /Users/me/documents/sensitive.pdf
Technical Details
Performance Tips
Batch Redaction is Faster:
Why batch redaction is better:
Reduces tool invocation overhead
Scans the PDF only once
Applies all redactions in a single pass
Automatically prevents duplicate redactions
Provides a single summary of all operations
Best Practice: Collect all texts to redact first, then make one batch call.
Dependencies
FastMCP (>=2.12.0): Python framework for building MCP servers
PyMuPDF (>=1.24.0): PDF manipulation library (imported as
fitz)
Architecture
In-memory storage: Loaded PDFs are kept in memory for fast access during redaction operations
Redaction tracking: The server tracks which texts have been redacted to prevent duplicate work
Batch processing: Multiple texts can be redacted in a single tool call for improved performance
Lazy application: Redaction annotations are added but not applied until
save_redacted_pdfis calledError handling: Uses FastMCP's
ToolErrorfor proper error propagation to MCP clientsContext logging: All operations log to the MCP context for transparency
Limitations (Current Version)
Text-only redaction: This version focuses on text redaction. Image redaction is not yet implemented.
Memory usage: PDFs are kept in memory while loaded. Very large PDFs may consume significant memory.
Single session: The in-memory store is not persistent across server restarts.
Development
Running Tests
Code Structure
License
Apache-2.0
Contributing
Contributions are welcome! Please feel free to submit issues or pull requests.