How do I use MCP Mistral OCR Optimized?

1. Click on "Install Server". 2. Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state. 3. In the chat, type @ followed by the MCP server name and your instructions, e.g., "@MCP Mistral OCR Optimized extract the tables from invoice.pdf into markdown" That's it! The server will respond to your query, and you can continue using it as needed. Here is a step-by-step guide with screenshots.

MCP Mistral OCR Optimized

Optimized MCP server for OCR processing using Mistral AI with batch processing and async connection pooling.

🚀 Key Optimizations

Feature	Benefit
Batch Processing API	Up to 50% cost reduction for large file sets
Async Connection Pooling	20-30% faster processing for multiple files
Token-Efficient Defaults	`include_images=False`, `table_format=markdown` saves 30-40% tokens
Concurrent Processing	Process up to 5 files simultaneously
Cross-Platform Paths	Works on Windows, macOS, Linux, and Docker
Configurable Parameters	Fine-tune OCR output with table_format, headers, footers

📦 Installation

Using UV (Recommended)

# Navigate to project directory
cd D:/dev/mcp_mistral_ocr_opt

# Create and activate virtual environment
uv venv
# Windows
.venv\Scripts\activate
# Unix
source .venv/bin/activate

# Install dependencies
uv pip install .

Using Docker

# Build image
docker build -t mcp-mistral-ocr-opt .

# Run container
docker run -e MISTRAL_API_KEY=your_api_key \
           -v /path/to/your/files:/data/ocr \
           mcp-mistral-ocr-opt:latest

⚙️ Configuration

Environment Variables

Create or edit .env file:

# Required
MISTRAL_API_KEY=your_api_key_here
OCR_DIR=D:/dev/mcp_mistral_ocr_opt/data/ocr

# Optional - Batch Processing
BATCH_MODE=auto                  # auto, always, never
BATCH_MIN_FILES=5                # Use batch processing for 5+ files in auto mode
INLINE_BATCH_THRESHOLD=10        # Use inline batch for <10 files
MAX_CONCURRENT_REQUESTS=5        # Max concurrent API requests

# Optional - OCR Defaults (token optimization)
DEFAULT_TABLE_FORMAT=markdown    # null, markdown, or html
INCLUDE_IMAGES=false             # Default false for token efficiency
EXTRACT_HEADER=false             # Extract document headers
EXTRACT_FOOTER=false             # Extract document footers

Claude Desktop Configuration

Add to claude_desktop_config.json:

{
  "mcpServers": {
    "mistral-ocr-opt": {
      "command": "uv",
      "args": [
        "run",
        "--directory",
        "D:/dev/mcp_mistral_ocr_opt",
        "-m",
        "src.mcp_mistral_ocr_opt.main"
      ],
      "env": {
        "MISTRAL_API_KEY": "your_api_key_here",
        "OCR_DIR": "D:/dev/mcp_mistral_ocr_opt/data/ocr",
        "BATCH_MODE": "auto"
      }
    }
  }
}

🛠️ Available Tools

1. `process_local_file` - Process a single file

Process a single local file from OCR_DIR.

{
  "name": "process_local_file",
  "arguments": {
    "filename": "document.pdf",
    "table_format": "markdown",
    "extract_header": false,
    "extract_footer": false,
    "include_images": false
  }
}

Parameters:

filename (required): Name of file relative to OCR_DIR
table_format (optional): null, markdown, or html - default: markdown
extract_header (optional): Extract document headers - default: false
extract_footer (optional): Extract document footers - default: false
include_images (optional): Include base64 images - default: false (token efficient)

Supported local file types:

PDFs: .pdf
Images: .jpg, .jpeg, .png, .gif, .webp, .bmp, .avif
Other formats (docx/xlsx/pptx) are not supported

2. `process_batch_local_files` - Process multiple files concurrently

Process multiple files with concurrent or batch processing (auto-selected).

{
  "name": "process_batch_local_files",
  "arguments": {
    "patterns": ["*.pdf", "scanned_*.jpg"],
    "max_files": 100,
    "table_format": "markdown",
    "include_images": false
  }
}

Parameters:

patterns (required): Array of glob patterns (e.g., ["*.pdf", "*.jpg"])
max_files (optional): Maximum files to process
Other parameters same as process_local_file

Auto-selection Logic:

< 5 files: Concurrent processing
5-9 files: Inline batch (if BATCH_MODE=auto)
10+ files: File batch (saves up to 50% cost)

3. `process_url_file` - Process file from URL

Process a file from a public URL.

{
  "name": "process_url_file",
  "arguments": {
    "url": "https://example.com/document.pdf",
    "file_type": "pdf",
    "table_format": "html"
  }
}

4. `create_batch_job` - Create explicit batch job

Create a batch processing job (for large file sets, cost savings up to 50%).

{
  "name": "create_batch_job",
  "arguments": {
    "patterns": ["documents/*.pdf"],
    "use_inline": false,
    "table_format": "markdown"
  }
}

Returns:

{
  "batch_type": "file",
  "job_id": "job_abc123",
  "batch_file_id": "file_xyz789",
  "files_queued": 50,
  "message": "Batch job created with 50 files. Use check_batch_status to monitor progress."
}

5. `check_batch_status` - Monitor batch job

{
  "name": "check_batch_status",
  "arguments": {
    "job_id": "job_abc123"
  }
}

Returns:

{
  "id": "job_abc123",
  "status": "SUCCESS",
  "created_at": "2026-01-22T12:00:00",
  "completed_at": "2026-01-22T12:05:00"
}

6. `download_batch_results` - Download completed results

{
  "name": "download_batch_results",
  "arguments": {
    "job_id": "job_abc123"
  }
}

7. `cancel_batch_job` - Cancel running job

{
  "name": "cancel_batch_job",
  "arguments": {
    "job_id": "job_abc123"
  }
}

8. `list_batch_jobs` - List all batch jobs

{
  "name": "list_batch_jobs",
  "arguments": {
    "status": "RUNNING"
  }
}

📊 Output

OCR results are saved in JSON format in OCR_DIR/output/:

Single files: {filename}_{timestamp}.json
Batch results: batch_results_{job_id}_{timestamp}.jsonl

Result structure:

{
  "pages": [
    {
      "index": 0,
      "markdown": "Extracted text content...",
      "images": [],
      "tables": [],
      "hyperlinks": [],
      "dimensions": {"width": 0, "height": 0}
    }
  ],
  "model": "mistral-ocr-latest",
  "usage_info": {...},
  "_metadata": {
    "source_file": "/path/to/document.pdf",
    "output_file": "/path/to/output.json",
    "file_type": "pdf",
    "processed_at": "2026-01-22T12:00:00",
    "table_format": "markdown",
    "include_images": false
  }
}

🎯 Usage Examples

Example 1: Process a single PDF with tables

{
  "name": "process_local_file",
  "arguments": {
    "filename": "invoice.pdf",
    "table_format": "html",
    "include_images": false
  }
}

Example 2: Process all PDFs in directory with batch

{
  "name": "process_batch_local_files",
  "arguments": {
    "patterns": ["*.pdf"],
    "table_format": "markdown"
  }
}

Example 3: Create explicit batch job for 100+ documents

{
  "name": "create_batch_job",
  "arguments": {
    "patterns": ["documents/**/*.pdf"],
    "use_inline": false,
    "table_format": "html",
    "extract_header": true,
    "extract_footer": true
  }
}

Then monitor:

{
  "name": "check_batch_status",
  "arguments": {
    "job_id": "job_abc123"
  }
}

And download when complete:

{
  "name": "download_batch_results",
  "arguments": {
    "job_id": "job_abc123"
  }
}

🔧 Performance Tips

Token Optimization

Set include_images=false (default) - saves 30-40% tokens
Use table_format="markdown" (default) - more efficient than HTML
Skip extract_header/extract_footer unless needed

Cost Optimization

Use batch processing for 10+ files (up to 50% cost savings)
Set BATCH_MODE=always for large recurring batches
Use max_files to limit processing if needed

Speed Optimization

Increase MAX_CONCURRENT_REQUESTS (default: 5, max: 10)
Use inline batch for 5-9 files (faster startup)
Enable BATCH_MODE=auto (default) for auto-selection

📈 Performance Benchmarks

Scenario	Old Version	Optimized	Improvement
10 files concurrent	45s	12s	4x faster
100 files batch	$5.00	$2.50	50% cheaper
With images (tokens)	100%	60%	40% fewer tokens
PDF processing (API calls)	300	100	3x fewer calls

▶️ Run via UV

uv run pytest
uv run pytest --cov=src --cov-report=term-missing
uv run python -m src.mcp_mistral_ocr_opt.main

🐳 Docker Support

Build Image

docker build -t mcp-mistral-ocr-opt .

Run Container

docker run -e MISTRAL_API_KEY=your_key \
           -e OCR_DIR=/data/ocr \
           -v $(pwd)/data/ocr:/data/ocr \
           mcp-mistral-ocr-opt:latest

Docker Compose

version: '3.8'
services:
  mistral-ocr:
    image: mcp-mistral-ocr-opt:latest
    environment:
      MISTRAL_API_KEY: ${MISTRAL_API_KEY}
      OCR_DIR: /data/ocr
      BATCH_MODE: auto
      MAX_CONCURRENT_REQUESTS: 5
    volumes:
      - ./data/ocr:/data/ocr
    restart: unless-stopped

🤝 Migration from Original

If migrating from the original mcp-mistral-ocr:

API Key: Same key works
Tools: All original tools still work
New Tools: Batch tools added (optional to use)
Defaults: More token-efficient by default

No code changes required for basic usage!

📝 Troubleshooting

Issue: "Configuration error: MISTRAL_API_KEY is required"

Solution: Add MISTRAL_API_KEY=your_key to .env file

Issue: "File not found"

Solution: Check OCR_DIR path in .env and ensure files are in that directory

Issue: "Batch job stuck in QUEUED"

Solution: Check Mistral dashboard or try cancel_batch_job and retry

Issue: Connection errors

Solution: Verify internet connection and API key is valid

📄 License

Based on the original mcp-mistral-ocr project.