Skip to main content
Glama
README.md7.12 kB
# Nanonets MCP Server An MCP (Model Context Protocol) server that exposes Nanonets OCR functionality for converting images to structured markdown. ## Features - **Advanced OCR**: Convert documents to structured markdown using Nanonets-OCR-s (3.75B parameter model) - **Multi-format Support**: Handles images, PDFs, Word documents, and Excel spreadsheets - **Images**: PNG, JPEG, BMP, TIFF, WEBP - **Documents**: PDF, DOCX, XLSX - **PDF Processing**: Complete multi-page PDF document processing with page-by-page OCR - **Office Document Processing**: Direct text extraction from Word and Excel files - **Intelligent Recognition**: Detects and converts: - Text and paragraphs - Tables with structure preservation - LaTeX equations - Images with descriptions - Signatures and watermarks - Checkboxes - Complex layouts - Multi-page documents with proper page separation - Word document headings and formatting - Excel worksheets and data tables ## Installation ### Option 1: Docker (Recommended with GPU) ```bash # Clone the repository git clone <repository-url> cd nanonets_mcp # Build and run with Docker Compose (requires NVIDIA Docker runtime) docker-compose up --build ``` **Prerequisites for GPU support:** - NVIDIA GPU with CUDA support - [NVIDIA Docker runtime](https://github.com/NVIDIA/nvidia-docker) installed - Docker Compose v3.8+ ### Option 2: Local Installation ```bash # Clone the repository git clone <repository-url> cd nanonets_mcp # Install dependencies with uv uv pip install -e . ``` ## Usage ### Running the Server #### With Docker: ```bash # Start with Docker Compose docker-compose up # Or run directly with Docker docker run --gpus all -p 8000:8000 nanonets-mcp:latest ``` #### Local Installation: ```bash # Start the MCP server nanonets-mcp # Or run directly python -m nanonets_mcp.server ``` ### Available Tools #### `ocr_image_to_markdown` Convert an image to structured markdown format. **Parameters:** - `image_data` (string): Image data as base64 string, data URL, or file path - `image_format` (optional string): Format hint (png, jpg, etc.) **Returns:** Structured markdown representation of the document #### `ocr_pdf_to_markdown` Convert an entire PDF document to structured markdown format. **Parameters:** - `pdf_data` (string): PDF data as base64 string, data URL, or file path **Returns:** Structured markdown representation of the entire PDF document with page separators #### `process_word_to_markdown` Convert a Word document (.docx) to structured markdown format. **Parameters:** - `docx_data` (string): Word document data as base64 string, data URL, or file path **Returns:** Structured markdown representation of the Word document with headings and tables #### `process_excel_to_markdown` Convert an Excel file (.xlsx) to structured markdown format. **Parameters:** - `excel_data` (string): Excel file data as base64 string, data URL, or file path **Returns:** Structured markdown representation of all worksheets in the Excel workbook #### `get_supported_formats` Get information about supported formats and capabilities. **Returns:** Dictionary with supported formats, input methods, capabilities, and processing options ### Available Resources #### `nanonets://model-info` Provides detailed information about the Nanonets OCR model, including capabilities and specifications. ## Examples ### Basic OCR Usage #### Image Processing ```python # Using file path result = await ocr_image_to_markdown("/path/to/document.png") # Using base64 data with open("document.jpg", "rb") as f: image_b64 = base64.b64encode(f.read()).decode() result = await ocr_image_to_markdown(image_b64) # Using data URL data_url = "..." result = await ocr_image_to_markdown(data_url) ``` #### PDF Processing ```python # Process entire PDF document result = await ocr_pdf_to_markdown("/path/to/document.pdf") # Using base64 PDF data with open("document.pdf", "rb") as f: pdf_b64 = base64.b64encode(f.read()).decode() result = await ocr_pdf_to_markdown(pdf_b64) # Result includes all pages with separators # Example output: # # PDF Document # *Total pages: 3* # # --- # # Page 1 # [Content of page 1] # # --- # # Page 2 # [Content of page 2] # ... ``` #### Word Document Processing ```python # Process Word document result = await process_word_to_markdown("/path/to/document.docx") # Using base64 Word document data with open("document.docx", "rb") as f: docx_b64 = base64.b64encode(f.read()).decode() result = await process_word_to_markdown(docx_b64) # Result includes text, headings, and tables # Example output: # # Word Document # # # Main Title # # This is a paragraph of text. # # ## Section Header # # More content here. # # | Name | Age | City | # | --- | --- | --- | # | John | 30 | NYC | ``` #### Excel Spreadsheet Processing ```python # Process Excel file result = await process_excel_to_markdown("/path/to/spreadsheet.xlsx") # Using base64 Excel data with open("spreadsheet.xlsx", "rb") as f: excel_b64 = base64.b64encode(f.read()).decode() result = await process_excel_to_markdown(excel_b64) # Result includes all worksheets as tables # Example output: # # Excel Workbook # # ## Sheet: Employee Data # # | Name | Department | Salary | # | --- | --- | --- | # | Alice | Engineering | 75000 | # | Bob | Marketing | 65000 | # # ## Sheet: Financial Data # # | Quarter | Revenue | Expenses | # | --- | --- | --- | # | Q1 | 150000 | 120000 | ``` ### Integration with Claude Desktop Add to your Claude Desktop configuration: ```json { "mcpServers": { "nanonets-ocr": { "command": "nanonets-mcp" } } } ``` ## Model Information - **Model**: nanonets/Nanonets-OCR-s - **Parameters**: 3.75B (based on Qwen2.5-VL-3B-Instruct) - **Input**: Images up to 2048x2048 pixels (recommended) and PDF documents - **Output**: Structured markdown with semantic tagging - **PDF Processing**: 200 DPI conversion, all pages processed sequentially ## Requirements ### Core Dependencies - Python ≥3.10 - PyTorch ≥2.0.0 - Transformers =4.53.0 - PIL/Pillow ≥10.0.0 - MCP ≥1.0.0 ### Optional Dependencies - pdf2image ≥1.16.0 (for PDF support) - PyMuPDF ≥1.23.0 (for PDF support) - python-docx ≥0.8.11 (for Word document support) - openpyxl ≥3.1.0 (for Excel support) - pandas ≥2.0.0 (for Excel support) ## Development ### Testing #### Docker Testing: ```bash # Test Docker build docker-compose build # Run health check docker-compose up -d docker-compose ps # View logs docker-compose logs -f nanonets-mcp # Stop services docker-compose down ``` #### Local Testing: ```bash # Test with MCP Inspector mcp dev nanonets_mcp/server.py # Install for development uv pip install -e . ``` ### Docker Management ```bash # Rebuild image after changes docker-compose build --no-cache # View resource usage docker stats nanonets-mcp-server # Access container shell docker-compose exec nanonets-mcp bash # Clean up volumes and images docker-compose down -v docker image prune -f ``` ## License [Add your license information here]

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/ArneJanning/nanonets-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server