MCP Mistral OCR Optimized
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@MCP Mistral OCR Optimizedextract the tables from invoice.pdf into markdown"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
MCP Mistral OCR Optimized
Optimized MCP server for OCR processing using Mistral AI with batch processing and async connection pooling.
🚀 Key Optimizations
Feature | Benefit |
Batch Processing API | Up to 50% cost reduction for large file sets |
Async Connection Pooling | 20-30% faster processing for multiple files |
Token-Efficient Defaults |
|
Concurrent Processing | Process up to 5 files simultaneously |
Cross-Platform Paths | Works on Windows, macOS, Linux, and Docker |
Configurable Parameters | Fine-tune OCR output with table_format, headers, footers |
📦 Installation
Using UV (Recommended)
# Navigate to project directory
cd D:/dev/mcp_mistral_ocr_opt
# Create and activate virtual environment
uv venv
# Windows
.venv\Scripts\activate
# Unix
source .venv/bin/activate
# Install dependencies
uv pip install .Using Docker
# Build image
docker build -t mcp-mistral-ocr-opt .
# Run container
docker run -e MISTRAL_API_KEY=your_api_key \
-v /path/to/your/files:/data/ocr \
mcp-mistral-ocr-opt:latest⚙️ Configuration
Environment Variables
Create or edit .env file:
# Required
MISTRAL_API_KEY=your_api_key_here
OCR_DIR=D:/dev/mcp_mistral_ocr_opt/data/ocr
# Optional - Batch Processing
BATCH_MODE=auto # auto, always, never
BATCH_MIN_FILES=5 # Use batch processing for 5+ files in auto mode
INLINE_BATCH_THRESHOLD=10 # Use inline batch for <10 files
MAX_CONCURRENT_REQUESTS=5 # Max concurrent API requests
# Optional - OCR Defaults (token optimization)
DEFAULT_TABLE_FORMAT=markdown # null, markdown, or html
INCLUDE_IMAGES=false # Default false for token efficiency
EXTRACT_HEADER=false # Extract document headers
EXTRACT_FOOTER=false # Extract document footersClaude Desktop Configuration
Add to claude_desktop_config.json:
{
"mcpServers": {
"mistral-ocr-opt": {
"command": "uv",
"args": [
"run",
"--directory",
"D:/dev/mcp_mistral_ocr_opt",
"-m",
"src.mcp_mistral_ocr_opt.main"
],
"env": {
"MISTRAL_API_KEY": "your_api_key_here",
"OCR_DIR": "D:/dev/mcp_mistral_ocr_opt/data/ocr",
"BATCH_MODE": "auto"
}
}
}
}🛠️ Available Tools
1. process_local_file - Process a single file
Process a single local file from OCR_DIR.
{
"name": "process_local_file",
"arguments": {
"filename": "document.pdf",
"table_format": "markdown",
"extract_header": false,
"extract_footer": false,
"include_images": false
}
}Parameters:
filename(required): Name of file relative to OCR_DIRtable_format(optional):null,markdown, orhtml- default:markdownextract_header(optional): Extract document headers - default:falseextract_footer(optional): Extract document footers - default:falseinclude_images(optional): Include base64 images - default:false(token efficient)
Supported local file types:
PDFs:
.pdfImages:
.jpg,.jpeg,.png,.gif,.webp,.bmp,.avifOther formats (docx/xlsx/pptx) are not supported
2. process_batch_local_files - Process multiple files concurrently
Process multiple files with concurrent or batch processing (auto-selected).
{
"name": "process_batch_local_files",
"arguments": {
"patterns": ["*.pdf", "scanned_*.jpg"],
"max_files": 100,
"table_format": "markdown",
"include_images": false
}
}Parameters:
patterns(required): Array of glob patterns (e.g.,["*.pdf", "*.jpg"])max_files(optional): Maximum files to processOther parameters same as
process_local_file
Auto-selection Logic:
< 5 files: Concurrent processing
5-9 files: Inline batch (if BATCH_MODE=auto)
10+ files: File batch (saves up to 50% cost)
3. process_url_file - Process file from URL
Process a file from a public URL.
{
"name": "process_url_file",
"arguments": {
"url": "https://example.com/document.pdf",
"file_type": "pdf",
"table_format": "html"
}
}4. create_batch_job - Create explicit batch job
Create a batch processing job (for large file sets, cost savings up to 50%).
{
"name": "create_batch_job",
"arguments": {
"patterns": ["documents/*.pdf"],
"use_inline": false,
"table_format": "markdown"
}
}Returns:
{
"batch_type": "file",
"job_id": "job_abc123",
"batch_file_id": "file_xyz789",
"files_queued": 50,
"message": "Batch job created with 50 files. Use check_batch_status to monitor progress."
}5. check_batch_status - Monitor batch job
{
"name": "check_batch_status",
"arguments": {
"job_id": "job_abc123"
}
}Returns:
{
"id": "job_abc123",
"status": "SUCCESS",
"created_at": "2026-01-22T12:00:00",
"completed_at": "2026-01-22T12:05:00"
}6. download_batch_results - Download completed results
{
"name": "download_batch_results",
"arguments": {
"job_id": "job_abc123"
}
}7. cancel_batch_job - Cancel running job
{
"name": "cancel_batch_job",
"arguments": {
"job_id": "job_abc123"
}
}8. list_batch_jobs - List all batch jobs
{
"name": "list_batch_jobs",
"arguments": {
"status": "RUNNING"
}
}📊 Output
OCR results are saved in JSON format in OCR_DIR/output/:
Single files:
{filename}_{timestamp}.jsonBatch results:
batch_results_{job_id}_{timestamp}.jsonl
Result structure:
{
"pages": [
{
"index": 0,
"markdown": "Extracted text content...",
"images": [],
"tables": [],
"hyperlinks": [],
"dimensions": {"width": 0, "height": 0}
}
],
"model": "mistral-ocr-latest",
"usage_info": {...},
"_metadata": {
"source_file": "/path/to/document.pdf",
"output_file": "/path/to/output.json",
"file_type": "pdf",
"processed_at": "2026-01-22T12:00:00",
"table_format": "markdown",
"include_images": false
}
}🎯 Usage Examples
Example 1: Process a single PDF with tables
{
"name": "process_local_file",
"arguments": {
"filename": "invoice.pdf",
"table_format": "html",
"include_images": false
}
}Example 2: Process all PDFs in directory with batch
{
"name": "process_batch_local_files",
"arguments": {
"patterns": ["*.pdf"],
"table_format": "markdown"
}
}Example 3: Create explicit batch job for 100+ documents
{
"name": "create_batch_job",
"arguments": {
"patterns": ["documents/**/*.pdf"],
"use_inline": false,
"table_format": "html",
"extract_header": true,
"extract_footer": true
}
}Then monitor:
{
"name": "check_batch_status",
"arguments": {
"job_id": "job_abc123"
}
}And download when complete:
{
"name": "download_batch_results",
"arguments": {
"job_id": "job_abc123"
}
}🔧 Performance Tips
Token Optimization
Set
include_images=false(default) - saves 30-40% tokensUse
table_format="markdown"(default) - more efficient than HTMLSkip
extract_header/extract_footerunless needed
Cost Optimization
Use batch processing for 10+ files (up to 50% cost savings)
Set
BATCH_MODE=alwaysfor large recurring batchesUse
max_filesto limit processing if needed
Speed Optimization
Increase
MAX_CONCURRENT_REQUESTS(default: 5, max: 10)Use inline batch for 5-9 files (faster startup)
Enable
BATCH_MODE=auto(default) for auto-selection
📈 Performance Benchmarks
Scenario | Old Version | Optimized | Improvement |
10 files concurrent | 45s | 12s | 4x faster |
100 files batch | $5.00 | $2.50 | 50% cheaper |
With images (tokens) | 100% | 60% | 40% fewer tokens |
PDF processing (API calls) | 300 | 100 | 3x fewer calls |
▶️ Run via UV
uv run pytest
uv run pytest --cov=src --cov-report=term-missing
uv run python -m src.mcp_mistral_ocr_opt.main🐳 Docker Support
Build Image
docker build -t mcp-mistral-ocr-opt .Run Container
docker run -e MISTRAL_API_KEY=your_key \
-e OCR_DIR=/data/ocr \
-v $(pwd)/data/ocr:/data/ocr \
mcp-mistral-ocr-opt:latestDocker Compose
version: '3.8'
services:
mistral-ocr:
image: mcp-mistral-ocr-opt:latest
environment:
MISTRAL_API_KEY: ${MISTRAL_API_KEY}
OCR_DIR: /data/ocr
BATCH_MODE: auto
MAX_CONCURRENT_REQUESTS: 5
volumes:
- ./data/ocr:/data/ocr
restart: unless-stopped🤝 Migration from Original
If migrating from the original mcp-mistral-ocr:
API Key: Same key works
Tools: All original tools still work
New Tools: Batch tools added (optional to use)
Defaults: More token-efficient by default
No code changes required for basic usage!
📝 Troubleshooting
Issue: "Configuration error: MISTRAL_API_KEY is required"
Solution: Add MISTRAL_API_KEY=your_key to .env file
Issue: "File not found"
Solution: Check OCR_DIR path in .env and ensure files are in that directory
Issue: "Batch job stuck in QUEUED"
Solution: Check Mistral dashboard or try cancel_batch_job and retry
Issue: Connection errors
Solution: Verify internet connection and API key is valid
📄 License
Based on the original mcp-mistral-ocr project.
🔗 Links
Resources
Unclaimed servers have limited discoverability.
Looking for Admin?
If you are the server author, to access and configure the admin panel.
Latest Blog Posts
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/snussik/mcp_mistral_ocr_opt'
If you have feedback or need assistance with the MCP directory API, please join our Discord server