README.mdā¢9.6 kB
# PDF Redaction MCP Server
A Model Context Protocol (MCP) server for PDF redaction using PyMuPDF (fitz). This server provides tools for loading PDFs, identifying and redacting sensitive text, and saving redacted documents.
## Features
- š **Load and read PDF files** - Extract text content from PDFs for review
- š **Batch text redaction** - Search and redact multiple text strings at once for maximum efficiency
- š **Redaction tracking** - Keep track of what's been redacted to prevent duplicate work
- š **List applied redactions** - Audit trail showing which texts have been marked for redaction
- š **Area-based redaction** - Redact specific rectangular regions by coordinates
- š¾ **Save redacted PDFs** - Apply redactions and save with automatic naming
- šØ **Customizable redaction appearance** - Choose redaction fill colors
- š **Error handling** - Comprehensive error messages via MCP protocol
## Installation
This project uses `uv` for package management. To install:
```bash
# Clone the repository
git clone <your-repo-url>
cd redact_mcp
# Install with uv
uv pip install -e .
```
## Usage
### Running the Server
You can run the server using either the Python script directly or the FastMCP CLI:
#### Option 1: Direct Python execution (stdio transport)
```bash
python -m redact_mcp.server
```
#### Option 2: Using FastMCP CLI
```bash
# Stdio transport (default)
fastmcp run redact_mcp.server:mcp
# HTTP transport for remote access
fastmcp run redact_mcp.server:mcp --transport http --port 8000
```
### Installing in MCP Clients
#### Claude Desktop
Add to your Claude Desktop configuration file:
**macOS**: `~/Library/Application Support/Claude/claude_desktop_config.json`
**Windows**: `%APPDATA%\Claude\claude_desktop_config.json`
```json
{
"mcpServers": {
"pdf-redaction": {
"command": "uv",
"args": [
"--directory",
"/path/to/redact_mcp",
"run",
"fastmcp",
"run",
"redact_mcp.server:mcp"
]
}
}
}
```
#### Other MCP Clients
Use the FastMCP CLI to generate configuration for other clients:
```bash
# For Cursor
fastmcp install cursor redact_mcp.server:mcp
# For Gemini CLI
fastmcp install gemini-cli redact_mcp.server:mcp
# Generate generic MCP JSON configuration
fastmcp install mcp-json redact_mcp.server:mcp
```
## Available Tools
### 1. `load_pdf`
Load a PDF file and extract its text content.
**Parameters:**
- `pdf_path` (string): Path to the PDF file to load
**Returns:** The full text content of the PDF, organized by pages
**Example:**
```
Load the PDF at /path/to/document.pdf
```
### 2. `redact_text`
Redact all instances of specific texts in a loaded PDF. **This tool now accepts multiple texts at once** for efficient batch redaction. It automatically tracks which texts have already been redacted to prevent duplicate work.
**Parameters:**
- `pdf_path` (string): Path to the loaded PDF file
- `texts_to_redact` (list of strings): List of text strings to search for and redact
- `fill_color` (tuple, optional): RGB color (0-1 range) for redaction box. Default: (0, 0, 0) - black
**Returns:** Summary of redaction operations, including which texts were newly redacted and which were skipped (already redacted)
**Examples:**
```
# Single text
Redact ["confidential"] in /path/to/document.pdf
# Multiple texts at once (recommended for efficiency)
Redact ["John Doe", "123-45-6789", "john.doe@email.com"] in /path/to/document.pdf
```
**Note:** The tool tracks which texts have been redacted and will skip any texts that were already processed, preventing duplicate redactions.
### 3. `redact_area`
Redact a specific rectangular area on a PDF page.
**Parameters:**
- `pdf_path` (string): Path to the loaded PDF file
- `page_number` (int): Page number (1-indexed)
- `x0` (float): Left x coordinate
- `y0` (float): Top y coordinate
- `x1` (float): Right x coordinate
- `y1` (float): Bottom y coordinate
- `fill_color` (tuple, optional): RGB color (0-1 range) for redaction box. Default: (0, 0, 0) - black
**Returns:** Confirmation message
**Example:**
```
Redact the area from (100, 100) to (300, 150) on page 1 of /path/to/document.pdf
```
### 4. `save_redacted_pdf`
Apply all pending redactions and save the PDF.
**Parameters:**
- `pdf_path` (string): Path to the loaded PDF file
- `output_path` (string, optional): Custom output path. If not provided, appends "_redacted" to original filename
**Returns:** Path to the saved redacted PDF
**Example:**
```
Save the redacted version of /path/to/document.pdf
```
### 5. `list_loaded_pdfs`
List all currently loaded PDF files.
**Parameters:** None
**Returns:** List of loaded PDF paths with page counts
### 6. `list_applied_redactions`
List all redactions that have been applied to loaded PDF(s). **New tool** for tracking redaction progress and avoiding duplicate work.
**Parameters:**
- `pdf_path` (string, optional): Path to a specific PDF. If not provided, lists redactions for all loaded PDFs
**Returns:** List of texts that have been marked for redaction in each PDF
**Examples:**
```
# List redactions for a specific PDF
List applied redactions for /path/to/document.pdf
# List redactions for all loaded PDFs
List all applied redactions
```
**Use Cases:**
- Check what has already been redacted before adding more redactions
- Verify redaction progress during a multi-step process
- Avoid duplicate redaction attempts
- Generate a report of what was redacted
### 7. `close_pdf`
Close a loaded PDF and free its resources. This also clears the redaction tracking for that PDF.
**Parameters:**
- `pdf_path` (string): Path to the PDF file to close
**Returns:** Confirmation message
## Workflow Example
Here's a typical workflow using this MCP server:
1. **Load a PDF**
```
Load the PDF at /Users/me/documents/sensitive.pdf
```
2. **Review the content**
The tool will return the full text content, which you can review to identify sensitive information.
3. **Redact sensitive text (batch mode - recommended)**
```
Redact ["Social Security Number", "123-45-6789", "John Doe", "jane.smith@email.com"] in /Users/me/documents/sensitive.pdf
```
**Pro tip:** Redacting multiple texts at once is much faster than calling the tool multiple times.
4. **Check what has been redacted (optional)**
```
List applied redactions for /Users/me/documents/sensitive.pdf
```
This shows you which texts have already been marked for redaction.
5. **Add more redactions if needed**
```
Redact ["Additional Text", "Another Secret"] in /Users/me/documents/sensitive.pdf
```
The tool will skip any texts that were already redacted in step 3.
6. **Redact specific areas (optional)**
```
Redact the area from (50, 100) to (200, 120) on page 2 of /Users/me/documents/sensitive.pdf
```
7. **Save the redacted PDF**
```
Save the redacted version of /Users/me/documents/sensitive.pdf
```
This will create `/Users/me/documents/sensitive_redacted.pdf`
8. **Close the PDF (optional)**
```
Close /Users/me/documents/sensitive.pdf
```
## Technical Details
### Performance Tips
**Batch Redaction is Faster:**
```
# ā Slower: Multiple individual calls
Redact ["John Doe"] in document.pdf
Redact ["123-45-6789"] in document.pdf
Redact ["jane@email.com"] in document.pdf
# ā
Faster: Single batch call
Redact ["John Doe", "123-45-6789", "jane@email.com"] in document.pdf
```
**Why batch redaction is better:**
- Reduces tool invocation overhead
- Scans the PDF only once
- Applies all redactions in a single pass
- Automatically prevents duplicate redactions
- Provides a single summary of all operations
**Best Practice:** Collect all texts to redact first, then make one batch call.
### Dependencies
- **FastMCP** (>=2.12.0): Python framework for building MCP servers
- **PyMuPDF** (>=1.24.0): PDF manipulation library (imported as `fitz`)
### Architecture
- **In-memory storage**: Loaded PDFs are kept in memory for fast access during redaction operations
- **Redaction tracking**: The server tracks which texts have been redacted to prevent duplicate work
- **Batch processing**: Multiple texts can be redacted in a single tool call for improved performance
- **Lazy application**: Redaction annotations are added but not applied until `save_redacted_pdf` is called
- **Error handling**: Uses FastMCP's `ToolError` for proper error propagation to MCP clients
- **Context logging**: All operations log to the MCP context for transparency
### Limitations (Current Version)
- **Text-only redaction**: This version focuses on text redaction. Image redaction is not yet implemented.
- **Memory usage**: PDFs are kept in memory while loaded. Very large PDFs may consume significant memory.
- **Single session**: The in-memory store is not persistent across server restarts.
## Development
### Running Tests
```bash
# Install development dependencies
uv pip install -e ".[dev]"
# Run tests (when implemented)
pytest
```
### Code Structure
```
redact_mcp/
āāā src/
ā āāā redact_mcp/
ā āāā __init__.py # Package initialization
ā āāā server.py # Main MCP server implementation
āāā pyproject.toml # Package configuration
āāā README.md # This file
```
## License
Apache-2.0
## Contributing
Contributions are welcome! Please feel free to submit issues or pull requests.
## Acknowledgments
- Built with [FastMCP](https://gofastmcp.com/)
- PDF manipulation powered by [PyMuPDF](https://pymupdf.readthedocs.io/)