PDFtotext MCP Server
A reliable Model Context Protocol (MCP) server for PDF text extraction using the proven pdftotext utility from poppler-utils.
๐ Why This Server?
Unlike other PDF MCP servers that suffer from logging interference, complex dependencies, and reliability issues, pdftotext-mcp is:
โ Actually works - Clean JSON-RPC communication without stdout pollution
โ Reliable - Built on mature
pdftotextfrom poppler-utils (used by millions)โ Lightweight - Minimal dependencies, maximum compatibility
โ Production tested - Successfully tested with Claude Desktop and other MCP clients
โ Feature complete - Page-specific extraction, layout preservation, encoding options
โ Error handling - Comprehensive validation and helpful error messages
Related MCP server: PDF Extraction MCP Server
๐ Features
๐ Extract text from entire PDF documents or specific pages
๐จ Preserve original layout formatting (optional)
๐ค Multiple text encoding support (UTF-8, Latin1, ASCII)
๐ Comprehensive metadata in responses (word count, file info, etc.)
๐ก๏ธ File validation and security checks
โก Fast processing with configurable timeouts
๐ Detailed error reporting with troubleshooting hints
๐ง Prerequisites
You must have pdftotext installed on your system:
Ubuntu/Debian
macOS
Windows
Verify Installation
๐ฆ Installation
Option 1: Global Installation (Recommended)
Option 2: Use with npx (No Installation)
Option 3: Local Development
โ๏ธ Configuration
Add to your MCP client configuration:
Claude Desktop
Add to claude_desktop_config.json:
Or with npx:
Other MCP Clients
The server works with any MCP-compatible client. Use pdftotext-mcp as the command.
๐ฏ Usage
The server provides a single, powerful tool: read_pdf_text
Basic Usage
Extract entire document
Extract specific page
Preserve layout formatting
Custom encoding
Response Format
Success Response
Error Response
๐ API Reference
Tool: read_pdf_text
Extracts text content from PDF files using pdftotext.
Parameters
Parameter | Type | Required | Default | Description |
| string | โ | - | Path to PDF file (relative or absolute) |
| number | โ | all pages | Specific page to extract (1-based) |
| boolean | โ |
| Preserve original text layout |
| string | โ |
| Output text encoding |
Supported Encodings
UTF-8(default)Latin1ASCII
Error Types
FILE_NOT_FOUND- PDF file doesn't existPERMISSION_DENIED- Cannot read the fileINVALID_PDF- File is not a valid PDFPDFTOTEXT_ERROR- pdftotext utility errorUNKNOWN_ERROR- Unexpected error
๐ง Troubleshooting
"pdftotext is not available"
Solution: Install poppler-utils (see Prerequisites)
"File not found"
Solutions:
Use absolute paths:
/home/user/document.pdfCheck file exists:
ls -la /path/to/file.pdfVerify MCP server working directory
"Permission denied"
Solutions:
Check file permissions:
chmod 644 document.pdfEnsure directory is readable:
chmod 755 /path/to/directory/
"File is not a valid PDF"
Solutions:
Verify file is actually a PDF:
file document.pdfCheck for file corruption
Try with a different PDF file
MCP Connection Issues
Solutions:
Restart your MCP client completely
Check configuration syntax in config file
Verify
pdftotext-mcpis accessible in PATHCheck MCP client logs for detailed errors
๐งช Testing
๐ค Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
Development Setup
Running Locally
Code Style
This project uses ESLint. Run npm run lint to check code style.
๐ License
MIT - see LICENSE file for details.
๐ Acknowledgments
Built for the Model Context Protocol ecosystem
Uses poppler-utils
pdftotextutilityInspired by the need for reliable PDF processing in MCP environments
๐ Related
Made for the MCP community