Nanonets MCP Server
An MCP (Model Context Protocol) server that exposes Nanonets OCR functionality for converting images to structured markdown.
Features
Advanced OCR: Convert documents to structured markdown using Nanonets-OCR-s (3.75B parameter model)
Multi-format Support: Handles images, PDFs, Word documents, and Excel spreadsheets
Images: PNG, JPEG, BMP, TIFF, WEBP
Documents: PDF, DOCX, XLSX
PDF Processing: Complete multi-page PDF document processing with page-by-page OCR
Office Document Processing: Direct text extraction from Word and Excel files
Intelligent Recognition: Detects and converts:
Text and paragraphs
Tables with structure preservation
LaTeX equations
Images with descriptions
Signatures and watermarks
Checkboxes
Complex layouts
Multi-page documents with proper page separation
Word document headings and formatting
Excel worksheets and data tables
Installation
Option 1: Docker (Recommended with GPU)
Prerequisites for GPU support:
NVIDIA GPU with CUDA support
NVIDIA Docker runtime installed
Docker Compose v3.8+
Option 2: Local Installation
Usage
Running the Server
With Docker:
Local Installation:
Available Tools
ocr_image_to_markdown
Convert an image to structured markdown format.
Parameters:
image_data(string): Image data as base64 string, data URL, or file pathimage_format(optional string): Format hint (png, jpg, etc.)
Returns: Structured markdown representation of the document
ocr_pdf_to_markdown
Convert an entire PDF document to structured markdown format.
Parameters:
pdf_data(string): PDF data as base64 string, data URL, or file path
Returns: Structured markdown representation of the entire PDF document with page separators
process_word_to_markdown
Convert a Word document (.docx) to structured markdown format.
Parameters:
docx_data(string): Word document data as base64 string, data URL, or file path
Returns: Structured markdown representation of the Word document with headings and tables
process_excel_to_markdown
Convert an Excel file (.xlsx) to structured markdown format.
Parameters:
excel_data(string): Excel file data as base64 string, data URL, or file path
Returns: Structured markdown representation of all worksheets in the Excel workbook
get_supported_formats
Get information about supported formats and capabilities.
Returns: Dictionary with supported formats, input methods, capabilities, and processing options
Available Resources
nanonets://model-info
Provides detailed information about the Nanonets OCR model, including capabilities and specifications.
Examples
Basic OCR Usage
Image Processing
PDF Processing
Word Document Processing
Excel Spreadsheet Processing
Integration with Claude Desktop
Add to your Claude Desktop configuration:
Model Information
Model: nanonets/Nanonets-OCR-s
Parameters: 3.75B (based on Qwen2.5-VL-3B-Instruct)
Input: Images up to 2048x2048 pixels (recommended) and PDF documents
Output: Structured markdown with semantic tagging
PDF Processing: 200 DPI conversion, all pages processed sequentially
Requirements
Core Dependencies
Python ≥3.10
PyTorch ≥2.0.0
Transformers =4.53.0
PIL/Pillow ≥10.0.0
MCP ≥1.0.0
Optional Dependencies
pdf2image ≥1.16.0 (for PDF support)
PyMuPDF ≥1.23.0 (for PDF support)
python-docx ≥0.8.11 (for Word document support)
openpyxl ≥3.1.0 (for Excel support)
pandas ≥2.0.0 (for Excel support)
Development
Testing
Docker Testing:
Local Testing:
Docker Management
License
[Add your license information here]