PDF Redaction MCP Server

README.md•9.6 kB

# PDF Redaction MCP Server A Model Context Protocol (MCP) server for PDF redaction using PyMuPDF (fitz). This server provides tools for loading PDFs, identifying and redacting sensitive text, and saving redacted documents. ## Features - 📄 **Load and read PDF files** - Extract text content from PDFs for review - 🔍 **Batch text redaction** - Search and redact multiple text strings at once for maximum efficiency - 📋 **Redaction tracking** - Keep track of what's been redacted to prevent duplicate work - 🔎 **List applied redactions** - Audit trail showing which texts have been marked for redaction - 📐 **Area-based redaction** - Redact specific rectangular regions by coordinates - 💾 **Save redacted PDFs** - Apply redactions and save with automatic naming - 🎨 **Customizable redaction appearance** - Choose redaction fill colors - 🔒 **Error handling** - Comprehensive error messages via MCP protocol ## Installation This project uses `uv` for package management. To install: ```bash # Clone the repository git clone <your-repo-url> cd redact_mcp # Install with uv uv pip install -e . ``` ## Usage ### Running the Server You can run the server using either the Python script directly or the FastMCP CLI: #### Option 1: Direct Python execution (stdio transport) ```bash python -m redact_mcp.server ``` #### Option 2: Using FastMCP CLI ```bash # Stdio transport (default) fastmcp run redact_mcp.server:mcp # HTTP transport for remote access fastmcp run redact_mcp.server:mcp --transport http --port 8000 ``` ### Installing in MCP Clients #### Claude Desktop Add to your Claude Desktop configuration file: **macOS**: `~/Library/Application Support/Claude/claude_desktop_config.json` **Windows**: `%APPDATA%\Claude\claude_desktop_config.json` ```json { "mcpServers": { "pdf-redaction": { "command": "uv", "args": [ "--directory", "/path/to/redact_mcp", "run", "fastmcp", "run", "redact_mcp.server:mcp" ] } } } ``` #### Other MCP Clients Use the FastMCP CLI to generate configuration for other clients: ```bash # For Cursor fastmcp install cursor redact_mcp.server:mcp # For Gemini CLI fastmcp install gemini-cli redact_mcp.server:mcp # Generate generic MCP JSON configuration fastmcp install mcp-json redact_mcp.server:mcp ``` ## Available Tools ### 1. `load_pdf` Load a PDF file and extract its text content. **Parameters:** - `pdf_path` (string): Path to the PDF file to load **Returns:** The full text content of the PDF, organized by pages **Example:** ``` Load the PDF at /path/to/document.pdf ``` ### 2. `redact_text` Redact all instances of specific texts in a loaded PDF. **This tool now accepts multiple texts at once** for efficient batch redaction. It automatically tracks which texts have already been redacted to prevent duplicate work. **Parameters:** - `pdf_path` (string): Path to the loaded PDF file - `texts_to_redact` (list of strings): List of text strings to search for and redact - `fill_color` (tuple, optional): RGB color (0-1 range) for redaction box. Default: (0, 0, 0) - black **Returns:** Summary of redaction operations, including which texts were newly redacted and which were skipped (already redacted) **Examples:** ``` # Single text Redact ["confidential"] in /path/to/document.pdf # Multiple texts at once (recommended for efficiency) Redact ["John Doe", "123-45-6789", "john.doe@email.com"] in /path/to/document.pdf ``` **Note:** The tool tracks which texts have been redacted and will skip any texts that were already processed, preventing duplicate redactions. ### 3. `redact_area` Redact a specific rectangular area on a PDF page. **Parameters:** - `pdf_path` (string): Path to the loaded PDF file - `page_number` (int): Page number (1-indexed) - `x0` (float): Left x coordinate - `y0` (float): Top y coordinate - `x1` (float): Right x coordinate - `y1` (float): Bottom y coordinate - `fill_color` (tuple, optional): RGB color (0-1 range) for redaction box. Default: (0, 0, 0) - black **Returns:** Confirmation message **Example:** ``` Redact the area from (100, 100) to (300, 150) on page 1 of /path/to/document.pdf ``` ### 4. `save_redacted_pdf` Apply all pending redactions and save the PDF. **Parameters:** - `pdf_path` (string): Path to the loaded PDF file - `output_path` (string, optional): Custom output path. If not provided, appends "_redacted" to original filename **Returns:** Path to the saved redacted PDF **Example:** ``` Save the redacted version of /path/to/document.pdf ``` ### 5. `list_loaded_pdfs` List all currently loaded PDF files. **Parameters:** None **Returns:** List of loaded PDF paths with page counts ### 6. `list_applied_redactions` List all redactions that have been applied to loaded PDF(s). **New tool** for tracking redaction progress and avoiding duplicate work. **Parameters:** - `pdf_path` (string, optional): Path to a specific PDF. If not provided, lists redactions for all loaded PDFs **Returns:** List of texts that have been marked for redaction in each PDF **Examples:** ``` # List redactions for a specific PDF List applied redactions for /path/to/document.pdf # List redactions for all loaded PDFs List all applied redactions ``` **Use Cases:** - Check what has already been redacted before adding more redactions - Verify redaction progress during a multi-step process - Avoid duplicate redaction attempts - Generate a report of what was redacted ### 7. `close_pdf` Close a loaded PDF and free its resources. This also clears the redaction tracking for that PDF. **Parameters:** - `pdf_path` (string): Path to the PDF file to close **Returns:** Confirmation message ## Workflow Example Here's a typical workflow using this MCP server: 1. **Load a PDF** ``` Load the PDF at /Users/me/documents/sensitive.pdf ``` 2. **Review the content** The tool will return the full text content, which you can review to identify sensitive information. 3. **Redact sensitive text (batch mode - recommended)** ``` Redact ["Social Security Number", "123-45-6789", "John Doe", "jane.smith@email.com"] in /Users/me/documents/sensitive.pdf ``` **Pro tip:** Redacting multiple texts at once is much faster than calling the tool multiple times. 4. **Check what has been redacted (optional)** ``` List applied redactions for /Users/me/documents/sensitive.pdf ``` This shows you which texts have already been marked for redaction. 5. **Add more redactions if needed** ``` Redact ["Additional Text", "Another Secret"] in /Users/me/documents/sensitive.pdf ``` The tool will skip any texts that were already redacted in step 3. 6. **Redact specific areas (optional)** ``` Redact the area from (50, 100) to (200, 120) on page 2 of /Users/me/documents/sensitive.pdf ``` 7. **Save the redacted PDF** ``` Save the redacted version of /Users/me/documents/sensitive.pdf ``` This will create `/Users/me/documents/sensitive_redacted.pdf` 8. **Close the PDF (optional)** ``` Close /Users/me/documents/sensitive.pdf ``` ## Technical Details ### Performance Tips **Batch Redaction is Faster:** ``` # ❌ Slower: Multiple individual calls Redact ["John Doe"] in document.pdf Redact ["123-45-6789"] in document.pdf Redact ["jane@email.com"] in document.pdf # ✅ Faster: Single batch call Redact ["John Doe", "123-45-6789", "jane@email.com"] in document.pdf ``` **Why batch redaction is better:** - Reduces tool invocation overhead - Scans the PDF only once - Applies all redactions in a single pass - Automatically prevents duplicate redactions - Provides a single summary of all operations **Best Practice:** Collect all texts to redact first, then make one batch call. ### Dependencies - **FastMCP** (>=2.12.0): Python framework for building MCP servers - **PyMuPDF** (>=1.24.0): PDF manipulation library (imported as `fitz`) ### Architecture - **In-memory storage**: Loaded PDFs are kept in memory for fast access during redaction operations - **Redaction tracking**: The server tracks which texts have been redacted to prevent duplicate work - **Batch processing**: Multiple texts can be redacted in a single tool call for improved performance - **Lazy application**: Redaction annotations are added but not applied until `save_redacted_pdf` is called - **Error handling**: Uses FastMCP's `ToolError` for proper error propagation to MCP clients - **Context logging**: All operations log to the MCP context for transparency ### Limitations (Current Version) - **Text-only redaction**: This version focuses on text redaction. Image redaction is not yet implemented. - **Memory usage**: PDFs are kept in memory while loaded. Very large PDFs may consume significant memory. - **Single session**: The in-memory store is not persistent across server restarts. ## Development ### Running Tests ```bash # Install development dependencies uv pip install -e ".[dev]" # Run tests (when implemented) pytest ``` ### Code Structure ``` redact_mcp/ ├── src/ │ └── redact_mcp/ │ ├── __init__.py # Package initialization │ └── server.py # Main MCP server implementation ├── pyproject.toml # Package configuration └── README.md # This file ``` ## License Apache-2.0 ## Contributing Contributions are welcome! Please feel free to submit issues or pull requests. ## Acknowledgments - Built with [FastMCP](https://gofastmcp.com/) - PDF manipulation powered by [PyMuPDF](https://pymupdf.readthedocs.io/)

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/marc-hanheide/redact_mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server