ReadPDFx - OCR PDF MCP Server
Official MCP SDK STDIO Server - MCP Protocol 2025-06-18 Compliant
โก Quick Start (STDIO Server)
1. Install Dependencies
2. Validate Installation
3. Client Integration
The server runs via STDIO protocol - configure your MCP client:
Claude Desktop:
๐ Features
๐ฏ Official MCP SDK: Built with official FastMCP framework
๐ก STDIO Transport: Standard MCP protocol over STDIO
๐ง Smart PDF Processing: Automatically detects digital vs scanned content
๐ง 5 OCR Tools: Text extraction, OCR processing, combined operations
๐ Universal Client Support: Claude Desktop, LM Studio, Continue.dev, Cursor
โก Lightweight: ~200 lines vs 800+ in HTTP implementation
๐ก๏ธ Production Ready: Comprehensive error handling and logging
๐ Auto Tool Registration: Decorators handle tool discovery
๐ง Installation
Prerequisites
Python 3.8+
Tesseract OCR
Windows
macOS
Linux
๐ Available Tools
1. Smart PDF Processing
Intelligent processing with automatic OCR detection:
2. PDF Text Extraction
Direct text extraction from digital PDFs:
3. OCR Processing
OCR on image files:
4. PDF Structure Analysis
Analyze document structure and metadata:
5. Batch Processing
Process multiple files:
๐ Client Integration
Claude Desktop
Add to claude_desktop_config.json:
LM Studio
Configure MCP server with:
Command:
pythonArgs:
path/to/readpdfx/run.pyURL:
http://localhost:8000(HTTP mode)
Continue.dev
Add to config.json:
Cursor
Configure in settings.json:
๐ See
๐ API Endpoints
MCP Protocol Endpoints
POST /mcp/initialize- Initialize MCP sessionPOST /mcp/tools/list- List available toolsPOST /mcp/tools/call- Call MCP toolsGET /mcp/manifest- Get MCP manifest
HTTP Endpoints
GET /health- Health checkPOST /jsonrpc- JSON-RPC 2.0 endpointGET /docs- API documentationGET /tools- Tools discovery
๐ง Configuration
Environment Variables
Config Files
mcp.json- MCP Protocol configurationmcp-config.yaml- YAML configurationpyproject.toml- Python project configpackage.json- Node.js compatibility
๐ณ Docker & Kubernetes
Docker Deployment
Quick Start with Docker
Automated Docker Deployment
Available Docker commands:
build- Build Docker image onlyrun- Build and run container (default)start- Start container (assumes image exists)stop- Stop running containerlogs- Show container logsclean- Stop container and remove imagestatus- Show container status
Kubernetes Deployment
Deploy to Kubernetes
Kubernetes Resources
Deployment:
k8s/deployment.yaml- Main application deploymentService:
k8s/deployment.yaml- Service exposureIngress:
k8s/ingress.yaml- External accessConfigMap:
k8s/configmap.yaml- Configuration managementHPA:
k8s/hpa.yaml- Horizontal Pod Autoscaler
Kubernetes Commands
Production Considerations
Multi-stage Build
Use Dockerfile.prod for optimized production builds:
Environment Variables
Persistent Storage
๐งช Testing
Run Tests
Manual Testing
๐ Performance
Startup Time: < 2 seconds
Memory Usage: ~50MB base
Throughput: 10+ PDFs/minute
Concurrent Requests: Up to 100
File Size Limit: 100MB per file
๐ ๏ธ Development
Development Mode
Project Structure
Adding New Tools
Define tool schema in
mcp_tools.pyImplement tool handler method
Register tool in
MCPToolsRegistryUpdate tests and documentation
๐ Troubleshooting
Common Issues
Server won't start
OCR not working
Permission errors
Ensure read access to PDF files
Check write permissions for output directory
Run with appropriate user privileges
Connection timeout
Verify server is running:
curl http://localhost:8000/healthCheck firewall settings
Try HTTP instead of direct MCP connection
Debug Mode
๐ Monitoring
Health Check
Metrics (Future)
Request count and latency
Tool usage statistics
Error rates and types
Resource utilization
๐ค Contributing
Fork the repository
Create feature branch:
git checkout -b feature/new-toolMake changes and add tests
Submit pull request
Development Setup
๐ License
MIT License - see LICENSE file.
๐ Links
Repository: https://github.com/irev/mcp-readpdfx
Issues: https://github.com/irev/mcp-readpdfx/issues
Documentation: https://github.com/irev/mcp-readpdfx#readme
MCP Protocol: Model Context Protocol Specification
๐ Acknowledgments
MCP Protocol Team for the specification
FastAPI for the web framework
Tesseract OCR for text recognition
PyPDF2 and pdfplumber for PDF processing
Made with โค๏ธ for the MCP community