The MCP Image Recognition Server allows you to analyze images using various AI vision models with these capabilities:
Describe images from base64-encoded data, local file paths, or public URLs
Support multiple image formats including JPEG, PNG, GIF, WebP
Choose between AI providers: Anthropic, OpenAI, and Cloudflare Workers AI with configurable primary and fallback options
Optional Tesseract OCR integration for text extraction from images
Customizable prompts to guide the image description process
Flexible deployment options: local, Docker, or via uvx/pip installation
Integration with MCP-compatible clients like Claude Desktop and Cursor
Hosts the MCP server repository, allowing users to clone the codebase for local deployment and configuration
Utilizes OpenAI GPT-4 Vision API for image analysis and detailed descriptions from both base64-encoded images and image files
MCP Image Recognition Server
An MCP server that provides image recognition capabilities using Anthropic and OpenAI vision APIs. Version 0.1.2.
Features
Image description using Anthropic Claude Vision or OpenAI GPT-4 Vision
Support for multiple image formats (JPEG, PNG, GIF, WebP)
Configurable primary and fallback providers
Base64 and file-based image input support
Optional text extraction using Tesseract OCR
Related MCP server: Google OCR
Requirements
Python 3.8 or higher
Tesseract OCR (optional) - Required for text extraction feature
Windows: Download and install from UB-Mannheim/tesseract
Linux:
sudo apt-get install tesseract-ocrmacOS:
brew install tesseract
Installation
Clone the repository:
Create and configure your environment file:
Build the project:
Usage
Running the Server
Spawn the server using python:
Start the server using batch instead:
Start the server in development mode with the MCP Inspector:
Available Tools
describe_imageInput: Base64-encoded image data and MIME type
Output: Detailed description of the image
describe_image_from_fileInput: Path to an image file
Output: Detailed description of the image
Environment Configuration
ANTHROPIC_API_KEY: Your Anthropic API key.OPENAI_API_KEY: Your OpenAI API key.VISION_PROVIDER: Primary vision provider (anthropicoropenai).FALLBACK_PROVIDER: Optional fallback provider.LOG_LEVEL: Logging level (DEBUG, INFO, WARNING, ERROR).ENABLE_OCR: Enable Tesseract OCR text extraction (trueorfalse).TESSERACT_CMD: Optional custom path to Tesseract executable.OPENAI_MODEL: OpenAI Model (default:gpt-4o-mini). Can use OpenRouter format for other models (e.g.,anthropic/claude-3.5-sonnet:beta).OPENAI_BASE_URL: Optional custom base URL for the OpenAI API. Set tohttps://openrouter.ai/api/v1for OpenRouter.OPENAI_TIMEOUT: Optional custom timeout (in seconds) for the OpenAI API.
Using OpenRouter
OpenRouter allows you to access various models using the OpenAI API format. To use OpenRouter, follow these steps:
Obtain an OpenAI API key from OpenRouter.
Set
OPENAI_API_KEYin your.envfile to your OpenRouter API key.Set
OPENAI_BASE_URLtohttps://openrouter.ai/api/v1.Set
OPENAI_MODELto the desired model using the OpenRouter format (e.g.,anthropic/claude-3.5-sonnet:beta).Set
VISION_PROVIDERtoopenai.
Default Models
Anthropic:
claude-3.5-sonnet-betaOpenAI:
gpt-4o-miniOpenRouter: Use the
anthropic/claude-3.5-sonnet:betaformat inOPENAI_MODEL.
Development
Running Tests
Run all tests:
Run specific test suite:
Docker Support
Build the Docker image:
Run the container:
License
MIT License - see LICENSE file for details.
Release History
0.1.2 (2025-02-20): Improved OCR error handling and added comprehensive test coverage for OCR functionality
0.1.1 (2025-02-19): Added Tesseract OCR support for text extraction from images (optional feature)
0.1.0 (2025-02-19): Initial release with Anthropic and OpenAI vision support