# Nanonets MCP Server
An MCP (Model Context Protocol) server that exposes Nanonets OCR functionality for converting images to structured markdown.
## Features
- **Advanced OCR**: Convert documents to structured markdown using Nanonets-OCR-s (3.75B parameter model)
- **Multi-format Support**: Handles images, PDFs, Word documents, and Excel spreadsheets
- **Images**: PNG, JPEG, BMP, TIFF, WEBP
- **Documents**: PDF, DOCX, XLSX
- **PDF Processing**: Complete multi-page PDF document processing with page-by-page OCR
- **Office Document Processing**: Direct text extraction from Word and Excel files
- **Intelligent Recognition**: Detects and converts:
- Text and paragraphs
- Tables with structure preservation
- LaTeX equations
- Images with descriptions
- Signatures and watermarks
- Checkboxes
- Complex layouts
- Multi-page documents with proper page separation
- Word document headings and formatting
- Excel worksheets and data tables
## Installation
### Option 1: Docker (Recommended with GPU)
```bash
# Clone the repository
git clone <repository-url>
cd nanonets_mcp
# Build and run with Docker Compose (requires NVIDIA Docker runtime)
docker-compose up --build
```
**Prerequisites for GPU support:**
- NVIDIA GPU with CUDA support
- [NVIDIA Docker runtime](https://github.com/NVIDIA/nvidia-docker) installed
- Docker Compose v3.8+
### Option 2: Local Installation
```bash
# Clone the repository
git clone <repository-url>
cd nanonets_mcp
# Install dependencies with uv
uv pip install -e .
```
## Usage
### Running the Server
#### With Docker:
```bash
# Start with Docker Compose
docker-compose up
# Or run directly with Docker
docker run --gpus all -p 8000:8000 nanonets-mcp:latest
```
#### Local Installation:
```bash
# Start the MCP server
nanonets-mcp
# Or run directly
python -m nanonets_mcp.server
```
### Available Tools
#### `ocr_image_to_markdown`
Convert an image to structured markdown format.
**Parameters:**
- `image_data` (string): Image data as base64 string, data URL, or file path
- `image_format` (optional string): Format hint (png, jpg, etc.)
**Returns:** Structured markdown representation of the document
#### `ocr_pdf_to_markdown`
Convert an entire PDF document to structured markdown format.
**Parameters:**
- `pdf_data` (string): PDF data as base64 string, data URL, or file path
**Returns:** Structured markdown representation of the entire PDF document with page separators
#### `process_word_to_markdown`
Convert a Word document (.docx) to structured markdown format.
**Parameters:**
- `docx_data` (string): Word document data as base64 string, data URL, or file path
**Returns:** Structured markdown representation of the Word document with headings and tables
#### `process_excel_to_markdown`
Convert an Excel file (.xlsx) to structured markdown format.
**Parameters:**
- `excel_data` (string): Excel file data as base64 string, data URL, or file path
**Returns:** Structured markdown representation of all worksheets in the Excel workbook
#### `get_supported_formats`
Get information about supported formats and capabilities.
**Returns:** Dictionary with supported formats, input methods, capabilities, and processing options
### Available Resources
#### `nanonets://model-info`
Provides detailed information about the Nanonets OCR model, including capabilities and specifications.
## Examples
### Basic OCR Usage
#### Image Processing
```python
# Using file path
result = await ocr_image_to_markdown("/path/to/document.png")
# Using base64 data
with open("document.jpg", "rb") as f:
image_b64 = base64.b64encode(f.read()).decode()
result = await ocr_image_to_markdown(image_b64)
# Using data URL
data_url = "data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAA..."
result = await ocr_image_to_markdown(data_url)
```
#### PDF Processing
```python
# Process entire PDF document
result = await ocr_pdf_to_markdown("/path/to/document.pdf")
# Using base64 PDF data
with open("document.pdf", "rb") as f:
pdf_b64 = base64.b64encode(f.read()).decode()
result = await ocr_pdf_to_markdown(pdf_b64)
# Result includes all pages with separators
# Example output:
# # PDF Document
# *Total pages: 3*
#
# ---
# # Page 1
# [Content of page 1]
#
# ---
# # Page 2
# [Content of page 2]
# ...
```
#### Word Document Processing
```python
# Process Word document
result = await process_word_to_markdown("/path/to/document.docx")
# Using base64 Word document data
with open("document.docx", "rb") as f:
docx_b64 = base64.b64encode(f.read()).decode()
result = await process_word_to_markdown(docx_b64)
# Result includes text, headings, and tables
# Example output:
# # Word Document
#
# # Main Title
#
# This is a paragraph of text.
#
# ## Section Header
#
# More content here.
#
# | Name | Age | City |
# | --- | --- | --- |
# | John | 30 | NYC |
```
#### Excel Spreadsheet Processing
```python
# Process Excel file
result = await process_excel_to_markdown("/path/to/spreadsheet.xlsx")
# Using base64 Excel data
with open("spreadsheet.xlsx", "rb") as f:
excel_b64 = base64.b64encode(f.read()).decode()
result = await process_excel_to_markdown(excel_b64)
# Result includes all worksheets as tables
# Example output:
# # Excel Workbook
#
# ## Sheet: Employee Data
#
# | Name | Department | Salary |
# | --- | --- | --- |
# | Alice | Engineering | 75000 |
# | Bob | Marketing | 65000 |
#
# ## Sheet: Financial Data
#
# | Quarter | Revenue | Expenses |
# | --- | --- | --- |
# | Q1 | 150000 | 120000 |
```
### Integration with Claude Desktop
Add to your Claude Desktop configuration:
```json
{
"mcpServers": {
"nanonets-ocr": {
"command": "nanonets-mcp"
}
}
}
```
## Model Information
- **Model**: nanonets/Nanonets-OCR-s
- **Parameters**: 3.75B (based on Qwen2.5-VL-3B-Instruct)
- **Input**: Images up to 2048x2048 pixels (recommended) and PDF documents
- **Output**: Structured markdown with semantic tagging
- **PDF Processing**: 200 DPI conversion, all pages processed sequentially
## Requirements
### Core Dependencies
- Python ≥3.10
- PyTorch ≥2.0.0
- Transformers =4.53.0
- PIL/Pillow ≥10.0.0
- MCP ≥1.0.0
### Optional Dependencies
- pdf2image ≥1.16.0 (for PDF support)
- PyMuPDF ≥1.23.0 (for PDF support)
- python-docx ≥0.8.11 (for Word document support)
- openpyxl ≥3.1.0 (for Excel support)
- pandas ≥2.0.0 (for Excel support)
## Development
### Testing
#### Docker Testing:
```bash
# Test Docker build
docker-compose build
# Run health check
docker-compose up -d
docker-compose ps
# View logs
docker-compose logs -f nanonets-mcp
# Stop services
docker-compose down
```
#### Local Testing:
```bash
# Test with MCP Inspector
mcp dev nanonets_mcp/server.py
# Install for development
uv pip install -e .
```
### Docker Management
```bash
# Rebuild image after changes
docker-compose build --no-cache
# View resource usage
docker stats nanonets-mcp-server
# Access container shell
docker-compose exec nanonets-mcp bash
# Clean up volumes and images
docker-compose down -v
docker image prune -f
```
## License
[Add your license information here]