PDF Reader MCP Server
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@PDF Reader MCP Serverextract text from quarterly_report.pdf"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
PDF Reader MCP Server
An MCP (Model Context Protocol) server that provides comprehensive PDF processing capabilities with 18 powerful tools for text extraction, OCR, image conversion, metadata management, and optimization.
๐ Latest Updates
โ All 18 tools fully tested and working (September 2025)
๐ง Fixed JSON serialization issues - 100% compatibility achieved
โก Enhanced performance with intelligent caching system
๐ Multi-language OCR support with Chinese and English optimization
Features
๐ Smart Text Extraction
Intelligent PDF parsing with
pdfplumberfor high-quality text extractionAutomatic quality detection to identify when OCR is needed
Page-wise processing with flexible page range syntax
๐ Advanced OCR Support
Tesseract integration for scanned documents and image-based PDFs
Multi-language support with focus on Chinese and English
Confidence scoring for OCR quality assessment
Windows-friendly installation and setup
โก Performance Optimized
Smart caching system to avoid reprocessing unchanged files
Chunking strategies for handling large documents
Parallel page processing for improved performance
๐ฏ Flexible Page Selection
Support for complex page ranges:
"1,3,5"- Specific pages"1-10"- Page ranges"-1"- Last page"1,3,5-10,-1"- Combined syntax
Installation
๐ Quick Installation (Recommended)
Install and run with uvx (easiest method):
# Install and run directly with uvx (no setup required)
uvx pdfmcp-tools
# Or install globally for repeated use
uv tool install pdfmcp-tools
pdfmcp-toolsInstall from PyPI with pip:
# Install from PyPI
pip install pdfmcp-tools
# Run the server (both commands work)
pdfmcp-tools
# or
pdfreadermcpPrerequisites
Python 3.11+ (automatically handled by uvx/pip)
Tesseract OCR engine (for OCR functionality)
Install Tesseract OCR Engine
macOS:
# Using Homebrew (recommended)
brew install tesseract tesseract-langLinux (Ubuntu/Debian):
sudo apt update
sudo apt install tesseract-ocr tesseract-ocr-chi-sim tesseract-ocr-chi-traWindows:
Download from: https://github.com/UB-Mannheim/tesseract/wiki
Install the latest version (recommended: tesseract-ocr-w64-setup-v5.3.3.20231005.exe)
During installation, select "Additional Language Data" and install Chinese language packs
Add Tesseract to your PATH, or note the installation path for configuration
Development Installation (Advanced)
For development or local modification:
Install uv package manager (if not already installed):
macOS/Linux:
curl -LsSf https://astral.sh/uv/install.sh | shWindows:
powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"Clone and install for development:
git clone https://github.com/lihongwen/pdfreadermcp.git cd pdfreadermcp uv sync --dev uv run pdfreadermcp
Usage
๐ Running the Server
With uvx (recommended):
# Run directly (auto-downloads and starts)
uvx pdfreadermcp
# Or if globally installed
pdfreadermcpWith pip installation:
# After pip install pdfreadermcp
pdfreadermcpDevelopment mode:
# In project directory
uv run pdfreadermcpIntegration with Claude Desktop
Add to your Claude Desktop MCP configuration file:
macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
Windows: %APPDATA%\Claude\claude_desktop_config.json
Option 1: Using uvx (recommended):
{
"mcpServers": {
"pdfreadermcp": {
"command": "uvx",
"args": ["pdfmcp-tools"]
}
}
}Option 2: Using global installation:
{
"mcpServers": {
"pdfreadermcp": {
"command": "pdfmcp-tools"
}
}
}Option 3: Development/local installation:
{
"mcpServers": {
"pdfreadermcp": {
"command": "uv",
"args": [
"--directory",
"/path/to/your/pdfreadermcp",
"run",
"pdfreadermcp"
]
}
}
}๐ Complete Tool Suite (18 Tools)
All tools have been thoroughly tested and are fully functional. The server provides comprehensive PDF processing capabilities across multiple categories:
๐ Text Processing Tools (5 tools)
read_pdf- Intelligent text extraction with chunkingextract_page_text- Single page text extraction with multiple modessearch_pdf_text- Advanced text search with regex and context supportfind_and_highlight_text- Text search with highlighting coordinatesget_pdf_metadata- Comprehensive metadata reading with XMP support
๐ Document Operations Tools (5 tools)
split_pdf- Split PDFs into multiple files by page rangesextract_pages- Extract specific pages to new PDF filemerge_pdfs- Combine multiple PDFs into single documentset_pdf_metadata- Write/update PDF metadata fieldsremove_pdf_metadata- Remove specific or all metadata fields
๐ผ๏ธ Image Conversion Tools (3 tools)
pdf_to_images- Convert PDF pages to high-quality imagesimages_to_pdf- Convert multiple images to single PDFextract_pdf_images- Extract embedded images from PDF pages
๐ OCR Tool (1 tool)
ocr_pdf- Advanced OCR with multi-language support and confidence scoring
โก Optimization Tools (4 tools)
optimize_pdf- Comprehensive PDF optimization with multiple levelscompress_pdf_images- Image compression within PDF documentsremove_pdf_content- Remove specific content to reduce file sizeanalyze_pdf_size- File size analysis and optimization recommendations
Tools
read_pdf - Text Extraction Tool
Extracts text from PDF files with intelligent processing.
Parameters:
file_path(required): Path to PDF filepages(optional): Page range string (e.g., "1,3,5-10,-1")chunk_size(optional): Maximum chunk size (default: 1000)chunk_overlap(optional): Chunk overlap (default: 100)
Example:
Extract text from document.pdf, pages 1-5 and last pageocr_pdf - OCR Recognition Tool
Performs OCR on PDF pages using Tesseract for scanned documents and image-based PDFs.
Parameters:
file_path(required): Path to PDF filepages(optional): Page range string (e.g., "1,3,5-10,-1")language(optional): OCR language code (default: "chi_sim" for Chinese)chunk_size(optional): Maximum chunk size (default: 1000)chunk_overlap(optional): Chunk overlap (default: 100)dpi(optional): DPI for PDF to image conversion (default: 200)
Supported Languages:
chi_sim: Simplified Chinese (้ป่ฎค)chi_tra: Traditional Chineseeng: Englishchi_sim+eng: Chinese and English mixed
Example:
Perform OCR on scanned_doc.pdf with Chinese text recognitionsplit_pdf - PDF Splitting Tool
Split PDF into multiple files based on page ranges.
Parameters:
file_path(required): Path to source PDF filesplit_ranges(required): List of page ranges (e.g., ["1-5", "6-10", "11-15"])output_dir(optional): Output directory (defaults to source file directory)prefix(optional): Output file prefix (defaults to source filename)
Example:
Split document.pdf into multiple files: pages 1-10, 11-20, 21-30extract_pages - Page Extraction Tool
Extract specific pages from PDF to a new file.
Parameters:
file_path(required): Path to source PDF filepages(required): Page range (e.g., "1,3,5-7")output_file(optional): Output filename (auto-generated if not provided)output_dir(optional): Output directory (defaults to source file directory)
Example:
Extract pages 1, 5-8, and 15 from document.pdfmerge_pdfs - PDF Merging Tool
Merge multiple PDF files into a single file.
Parameters:
file_paths(required): List of PDF file paths to mergeoutput_file(optional): Output filename (auto-generated if not provided)output_dir(optional): Output directory (defaults to first file's directory)
Example:
Merge file1.pdf, file2.pdf, and file3.pdf into a single documentpdf_to_images - PDF to Images Converter
Convert PDF pages to high-quality images using pdf2image.
Parameters:
file_path(required): Path to PDF filepages(optional): Page range (e.g., "1,3,5-10,-1")dpi(optional): Resolution for conversion (default: 200)image_format(optional): Output format ('PNG', 'JPEG', etc.)output_dir(optional): Directory to save imagessave_to_disk(optional): Save to disk or keep in memory (default: True)
Example:
Convert first 5 pages of document.pdf to PNG images at 300 DPIimages_to_pdf - Images to PDF Converter
Convert multiple images into a single PDF document.
Parameters:
image_paths(required): List of image file pathsoutput_file(required): Output PDF file pathpage_size(optional): Page size ('A4', 'Letter', 'Legal', 'auto')quality(optional): JPEG compression quality (1-100, default: 95)title(optional): PDF document titleauthor(optional): PDF document author
Example:
Convert scan1.jpg, scan2.jpg, scan3.jpg to a single PDF with A4 pagesextract_pdf_images - PDF Image Extractor
Extract all embedded images from PDF pages.
Parameters:
file_path(required): Path to PDF filepages(optional): Page range (e.g., "1,3,5-10,-1")min_size(optional): Minimum image size ("WIDTHxHEIGHT", default: "100x100")output_dir(optional): Directory to save extracted images
Example:
Extract all images larger than 200x200 pixels from PDF pages 1-10get_pdf_metadata - PDF Metadata Reader
Read comprehensive metadata information from PDF documents.
Parameters:
file_path(required): Path to PDF fileinclude_xmp(optional): Include advanced XMP metadata (default: False)
Example:
Read all metadata from document.pdf including title, author, creation dateset_pdf_metadata - PDF Metadata Writer
Write or update PDF metadata fields.
Parameters:
file_path(required): Path to source PDF fileoutput_file(optional): Output PDF file pathtitle(optional): Document titleauthor(optional): Document authorsubject(optional): Document subjectcreator(optional): Creator application nameproducer(optional): Producer application namekeywords(optional): Keywords or tagspreserve_existing(optional): Preserve existing metadata (default: True)
Example:
Set metadata for report.pdf with title "Annual Report 2024" and author "John Doe"remove_pdf_metadata - PDF Metadata Remover
Remove specific metadata fields or all metadata from PDF.
Parameters:
file_path(required): Path to source PDF fileoutput_file(optional): Output PDF file pathfields_to_remove(optional): List of specific fields to removeremove_all(optional): Remove all metadata (default: False)
Example:
Remove author and title metadata from sensitive_document.pdfsearch_pdf_text - PDF Text Search Engine
Search for text content across PDF pages with detailed match information.
Parameters:
file_path(required): Path to PDF filequery(required): Text to search for (or regex pattern)pages(optional): Page range (e.g., "1,3,5-10,-1")case_sensitive(optional): Case-sensitive search (default: False)regex_search(optional): Treat query as regex pattern (default: False)context_chars(optional): Context characters around matches (default: 100)max_matches(optional): Maximum matches to return (default: 100)
Example:
Search for "financial report" in document.pdf with case-insensitive matchingextract_page_text - Single Page Text Extractor
Extract text from a specific PDF page with various extraction options.
Parameters:
file_path(required): Path to PDF filepage_number(required): Page number to extract (1-based)extraction_mode(optional): Extraction mode ("default", "layout", "simple")
Example:
Extract text from page 5 of document.pdf with layout preservationfind_and_highlight_text - Text Highlighting Tool
Find text and return information for highlighting matches.
Parameters:
file_path(required): Path to PDF filequery(required): Text to search forpages(optional): Page range (e.g., "1,3,5-10,-1")case_sensitive(optional): Case-sensitive search (default: False)
Example:
Find all instances of "important" in document.pdf for highlightingoptimize_pdf - PDF Optimization Tool
Optimize PDF file using various compression techniques.
Parameters:
file_path(required): Path to source PDF fileoutput_file(optional): Output PDF file pathoptimization_level(optional): Optimization preset ("light", "medium", "heavy", "maximum")
Example:
Optimize large_document.pdf using medium compression levelcompress_pdf_images - PDF Image Compression
Compress images in PDF while preserving document structure.
Parameters:
file_path(required): Path to source PDF fileoutput_file(optional): Output PDF file pathquality(optional): Image compression quality (1-100, default: 80)
Example:
Compress images in photo_heavy.pdf to 60% qualityremove_pdf_content - PDF Content Remover
Remove specific content from PDF to reduce file size.
Parameters:
file_path(required): Path to source PDF fileoutput_file(optional): Output PDF file pathremove_images(optional): Remove all images (default: False)remove_annotations(optional): Remove annotations (default: False)compress_streams(optional): Compress content streams (default: True)
Example:
Remove all images and annotations from document.pdf to reduce sizeanalyze_pdf_size - PDF Size Analysis Tool
Analyze PDF file to identify optimization opportunities.
Parameters:
file_path(required): Path to PDF file to analyze
Example:
Analyze large_file.pdf to get optimization recommendationsOutput Format
All tools return structured JSON containing relevant data. Text extraction and OCR tools return:
{
"success": true,
"file_path": "/path/to/file.pdf",
"total_pages": 10,
"processed_pages": [1, 2, 3],
"chunks": [
{
"content": "Extracted text...",
"page_number": 1,
"chunk_index": 0,
"metadata": {
"quality_score": 0.95,
"word_count": 150
}
}
],
"summary": {
"total_chunks": 5,
"total_chars": 2500,
"pages": [1, 2, 3]
},
"extraction_method": "text_extraction"
}Language Support
OCR Languages
The ocr_pdf tool supports multiple languages via Tesseract:
Chinese:
chi_sim(Simplified),chi_tra(Traditional)English:
engCombined:
chi_sim+eng(mixed Chinese and English)Others: Available based on your Tesseract installation
Performance Features
Caching System
File-based invalidation - Cache automatically invalidates when files change
Operation-specific caching - Different cache entries for different operations
Memory management - Configurable cache size and TTL
Text Quality Analysis
The system automatically analyzes extracted text quality using:
Character-to-word ratios
Sentence structure analysis
Letter-to-character ratios
Special character detection
Low-quality text triggers OCR recommendations.
Chunking Strategy
Recursive character splitting with semantic separators
Configurable overlap to preserve context
Metadata preservation including page numbers and positions
Error Handling
The server provides detailed error information:
Missing file errors
Invalid page range errors
OCR engine initialization errors
Processing timeout errors
Development
Project Structure
pdfreadermcp/
pyproject.toml # uv project configuration
README.md
src/pdfreadermcp/
__init__.py
__main__.py # Entry point
server.py # MCP server implementation
tools/
pdf_reader.py # Text extraction tool
pdf_ocr.py # OCR processing tool
pdf_operations.py # PDF splitting, merging, extraction
pdf_image_converter.py # PDF-image conversion tools
pdf_metadata.py # PDF metadata management
pdf_text_search.py # PDF text search and highlighting
pdf_optimizer.py # PDF compression and optimization
utils/
chunker.py # Text chunking utilities
cache.py # Caching system
file_handler.py # File operationsRunning Tests
# Install with dev dependencies
uv sync --dev
# Run tests (when available)
uv run pytestDependencies
Core Dependencies
mcp - Model Context Protocol server framework
pypdf - PDF text extraction and manipulation
pdf2image - PDF to image conversion
pytesseract - Python wrapper for Tesseract OCR
tesseract - OCR engine
pillow - Image processing and manipulation
System Requirements
For OCR: Tesseract OCR engine must be installed
For PDF conversion: poppler-utils may be required on some systems
Troubleshooting
Common Issues
1. Tesseract OCR Installation Issues
If Tesseract is not found, you may see errors like "TesseractNotFoundError". Solutions:
Windows:
Ensure Tesseract is installed and added to PATH
Or set the path manually in your environment:
import pytesseract
pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract.exe'macOS/Linux:
Install via package manager:
brew install tesseract(macOS) orapt install tesseract-ocr(Ubuntu)Make sure Chinese language packs are installed
2. pdf2image Dependencies On Linux, you may need to install poppler:
# Ubuntu/Debian
sudo apt-get install poppler-utils
# CentOS/RHEL
sudo yum install poppler-utils3. Chinese Language Pack Issues
If OCR fails for Chinese text or produces poor results:
Windows: During Tesseract installation, select "Additional Language Data" and install Chinese packs
macOS:
brew install tesseract-langLinux:
sudo apt install tesseract-ocr-chi-sim tesseract-ocr-chi-tra
Verify language packs are installed:
tesseract --list-langs4. Memory Issues with Large PDFs
Reduce
chunk_sizeparameterProcess pages in smaller ranges
Ensure sufficient system memory
Lower
dpiparameter for faster processing
Performance Tips
Use caching - The same file with same parameters will use cached results
Process specific pages - Use page ranges instead of processing entire documents
Adjust chunk sizes - Smaller chunks for memory-constrained environments
Choose appropriate tools - Use
read_pdffirst, thenocr_pdfif neededOCR optimization:
Lower
dpi(150-200) for faster processingUse
chi_simonly if document is purely ChineseProcess problematic pages only, not entire document
๐งช Testing & Quality Assurance
This project has been thoroughly tested with comprehensive test coverage:
โ 18/18 tools fully functional (100% success rate)
๐ง All JSON serialization issues resolved
๐ Extensive testing with real PDF documents
โก Performance validation with caching system
๐ Multi-language OCR testing (Chinese/English)
License
This project is licensed under the MIT License.
Contributing
Contributions are welcome! Please feel free to submit issues and enhancement requests.
Support
For questions and support:
Create an issue in the project repository
Check the troubleshooting section above
Review the MCP documentation at https://modelcontextprotocol.io
Maintenance
Resources
Unclaimed servers have limited discoverability.
Looking for Admin?
If you are the server author, to access and configure the admin panel.
Latest Blog Posts
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/lihongwen/pdfreadermcp'
If you have feedback or need assistance with the MCP directory API, please join our Discord server