Building an MCP Server with OCR: From Setup Struggles to Document Intelligence- Proof of Concept
A real-world journey of setting up Anthropic's Model Context Protocol server with advanced PDF processing capabilities
The Challenge: Making Claude Desktop Read Your Documents
Ever wished your AI assistant could read through your scanned PDFs, legal documents, or HOA covenants without you having to manually extract text? That's exactly what we set out to accomplish in this session - building a custom MCP (Model Context Protocol) server that not only reads PDFs but intelligently handles scanned documents using OCR.
Starting Point: Following the Official Tutorial
We began by following Anthropic's official MCP server tutorial to build a basic weather server. What seemed like a straightforward process quickly became a Windows-specific debugging adventure.
The Setup Struggles
Problem #1: Missing Configuration File The first hurdle was the infamous "cannot find claude_desktop_config.json" error. This configuration file doesn't exist by default - you need to create it manually in the right location:
Windows:
%APPDATA%\Claude\claude_desktop_config.json
macOS:
~/Library/Application Support/Claude/claude_desktop_config.json
Problem #2: UV Package Manager Permissions
The tutorial requires uv
(a fast Python package manager), but our initial installation had permission issues. The solution was using PowerShell as Administrator with the bypass flag:
Problem #3: Project Scripts Configuration
Even with uv
working, the server couldn't run because the pyproject.toml
was missing the crucial [project.scripts]
section:
Building the Weather Server Foundation
Once we overcame the setup issues, we had a basic weather server running with these tools:
get_forecast
- Simulated weather forecasts for any cityget_alerts
- Weather alerts for US states
The server successfully connected to Claude Desktop, proving our MCP infrastructure was working.
The Real Goal: Document Intelligence
With the foundation in place, we tackled the main objective - adding PDF reading capabilities with OCR support for scanned documents. This required several components:
1. Basic PDF Reading
Using PyPDF2
for extractable text PDFs:
2. OCR Integration
For scanned documents, we integrated:
pytesseract - Python wrapper for Google's Tesseract OCR engine
pdf2image - Converts PDF pages to images for OCR processing
Pillow - Image processing library
3. Intelligent Detection System
The server automatically determines whether a PDF needs OCR:
4. Caching System
Perhaps the most valuable feature - OCR results are cached to avoid reprocessing:
Cached files use naming pattern:
document_ocr_[hash].txt
Hash ensures cache invalidation when source PDF changes
Dramatically improves performance for repeat access
Security Considerations
The server includes built-in security measures:
Path Validation: Only allows access to predefined directories
File Type Restrictions: Limited to PDF files
Permission Checks: Validates file access before processing
Real-World Test: HOA Document Analysis
To validate our system, we processed actual HOA covenant documents:
Input: 5.1 MB scanned PDF with 40+ pages
Processing: Full OCR extraction and caching
Output: Complete one-page summary of key provisions
Result: Instant future access via cached text
The system successfully identified property restrictions, assessment procedures, architectural controls, and enforcement mechanisms from a complex legal document.
Final MCP Tools Arsenal
Our completed server provides these capabilities:
Document Tools:
read_pdf
- Read entire documents or specific pages with automatic OCRlist_pdfs
- Inventory available documents with scan/cache statussearch_pdf_content
- Full-text search within documents
Weather Tools:
get_forecast
- Weather forecasts for any locationget_alerts
- Weather alerts by state
Smart Features:
Automatic scanned PDF detection
Intelligent OCR fallback
Persistent caching system
Security-first file access
Lessons Learned
1. Windows Development Gotchas
PowerShell execution policies can block installations
Path separators matter in configuration files
Permission issues are common with package managers
2. OCR Implementation Insights
System dependencies (Tesseract, Poppler) are required
Caching is essential for practical OCR performance
Hybrid approach (text extraction + OCR fallback) works best
3. MCP Architecture Benefits
Modular tool design allows easy capability expansion
Security model provides controlled file system access
Integration with Claude Desktop creates seamless user experience
Performance Impact
The caching system provides dramatic performance improvements:
First Access: ~30-60 seconds for OCR processing
Subsequent Access: <1 second from cache
Storage Overhead: ~10-20% of original PDF size for text cache
What's Next?
This foundation opens up numerous possibilities:
Integration with cloud OCR services for better accuracy
Support for additional document formats (DOCX, images)
Semantic search using embeddings
Document comparison and analysis tools
Automated summarization and extraction pipelines
Code Availability
The complete MCP server code includes:
Comprehensive error handling
Type hints throughout
Detailed documentation
Production-ready security measures
Extensible architecture for additional tools
Conclusion
Building this MCP server transformed a basic tutorial into a powerful document intelligence system. What started as debugging configuration issues evolved into a practical tool for extracting insights from scanned legal documents.
The real value isn't just in the technical implementation - it's in democratizing access to document analysis. Now anyone can ask their AI assistant to "summarize my HOA covenants" or "what are the key restrictions in my lease?" and get instant, accurate responses from scanned PDFs.
The journey from setup struggles to document intelligence showcases both the power of the MCP architecture and the practical challenges of real-world AI development. Sometimes the best learning happens when things don't work as expected.
This MCP server demonstrates the potential of combining traditional document processing with modern AI capabilities. By handling the technical complexities behind the scenes, we enable natural language interaction with complex documents - transforming how people access and understand their important paperwork.
This server cannot be installed
local-only server
The server can only run on the client's local machine because it depends on local resources.
Enables Claude to read and analyze PDF documents with automatic OCR processing for scanned files. Features intelligent text extraction, caching for performance, and secure file access with search capabilities.
- The Challenge: Making Claude Desktop Read Your Documents
- Starting Point: Following the Official Tutorial
- Building the Weather Server Foundation
- The Real Goal: Document Intelligence
- Security Considerations
- Real-World Test: HOA Document Analysis
- Final MCP Tools Arsenal
- Lessons Learned
- Performance Impact
- What's Next?
- Code Availability
- Conclusion
Related MCP Servers
- -securityAlicense-qualityA comprehensive code analysis and management tool that integrates with Claude Desktop to analyze code at project and file levels, helping adapt changes to projects intelligently.Last updated -39MIT License
- AsecurityAlicenseAqualityProvides intelligent transcript processing capabilities for Claude, featuring natural formatting, contextual repair, and smart summarization powered by Deep Thinking LLMs.Last updated -416MIT License
- AsecurityAlicenseAqualityEnables Claude to interact with the Readwise Reader API, allowing for saving, listing, updating, and deleting documents with complete metadata and content access through natural language.Last updated -625MIT License
- AsecurityAlicenseAqualityEnables AI assistants to extract and read content from PDF documents using Mistral AI's OCR capabilities. Provides intelligent caching and returns clean markdown text for easy integration with AI workflows.Last updated -21MIT License