MCP Image Recognition Server
An MCP server that provides image recognition capabilities using Anthropic and OpenAI vision APIs. Version 0.1.2.
Features
- Image description using Anthropic Claude Vision or OpenAI GPT-4 Vision
- Support for multiple image formats (JPEG, PNG, GIF, WebP)
- Configurable primary and fallback providers
- Base64 and file-based image input support
- Optional text extraction using Tesseract OCR
Requirements
- Python 3.8 or higher
- Tesseract OCR (optional) - Required for text extraction feature
- Windows: Download and install from UB-Mannheim/tesseract
- Linux:
sudo apt-get install tesseract-ocr
- macOS:
brew install tesseract
Installation
- Clone the repository:
- Create and configure your environment file:
- Build the project:
Usage
Running the Server
Spawn the server using python:
Start the server using batch instead:
Start the server in development mode with the MCP Inspector:
Available Tools
describe_image
- Input: Base64-encoded image data and MIME type
- Output: Detailed description of the image
describe_image_from_file
- Input: Path to an image file
- Output: Detailed description of the image
Environment Configuration
ANTHROPIC_API_KEY
: Your Anthropic API key.OPENAI_API_KEY
: Your OpenAI API key.VISION_PROVIDER
: Primary vision provider (anthropic
oropenai
).FALLBACK_PROVIDER
: Optional fallback provider.LOG_LEVEL
: Logging level (DEBUG, INFO, WARNING, ERROR).ENABLE_OCR
: Enable Tesseract OCR text extraction (true
orfalse
).TESSERACT_CMD
: Optional custom path to Tesseract executable.OPENAI_MODEL
: OpenAI Model (default:gpt-4o-mini
). Can use OpenRouter format for other models (e.g.,anthropic/claude-3.5-sonnet:beta
).OPENAI_BASE_URL
: Optional custom base URL for the OpenAI API. Set tohttps://openrouter.ai/api/v1
for OpenRouter.OPENAI_TIMEOUT
: Optional custom timeout (in seconds) for the OpenAI API.
Using OpenRouter
OpenRouter allows you to access various models using the OpenAI API format. To use OpenRouter, follow these steps:
- Obtain an OpenAI API key from OpenRouter.
- Set
OPENAI_API_KEY
in your.env
file to your OpenRouter API key. - Set
OPENAI_BASE_URL
tohttps://openrouter.ai/api/v1
. - Set
OPENAI_MODEL
to the desired model using the OpenRouter format (e.g.,anthropic/claude-3.5-sonnet:beta
). - Set
VISION_PROVIDER
toopenai
.
Default Models
- Anthropic:
claude-3.5-sonnet-beta
- OpenAI:
gpt-4o-mini
- OpenRouter: Use the
anthropic/claude-3.5-sonnet:beta
format inOPENAI_MODEL
.
Development
Running Tests
Run all tests:
Run specific test suite:
Docker Support
Build the Docker image:
Run the container:
License
MIT License - see LICENSE file for details.
Release History
- 0.1.2 (2025-02-20): Improved OCR error handling and added comprehensive test coverage for OCR functionality
- 0.1.1 (2025-02-19): Added Tesseract OCR support for text extraction from images (optional feature)
- 0.1.0 (2025-02-19): Initial release with Anthropic and OpenAI vision support
remote-capable server
The server can be hosted and run remotely because it primarily relies on remote services or has no dependency on the local environment.
Anthropic Claude Vision 및 OpenAI GPT-4 Vision API를 사용하여 이미지 인식 기능을 제공하고, 여러 이미지 형식을 지원하며, Tesseract OCR을 통해 선택적 텍스트 추출 기능을 제공합니다.
Related Resources
Related MCP Servers
- AsecurityAlicenseAqualityA server that accepts image URLs and analyzes their content using GPT-4-turbo, enabling Claude AI assistants to understand and describe images through natural language.Last updated -116MIT License
- AsecurityAlicenseAqualityThis is a server implementation for performing Optical Character Recognition (OCR) using the Google Cloud Vision API. It is built on top of the FastMCP framework, which allows for the creation of modular and extensible command processing tools.Last updated -11MIT License
Textin MCP Serverofficial
AsecurityAlicenseAqualityA server that enables OCR capabilities to recognize text from images, PDFs, and Word documents, convert them to Markdown, and extract key information.Last updated -311023MIT License- -securityFlicense-qualityProvides AI-powered visual analysis capabilities for Claude and other MCP-compatible AI assistants, allowing them to capture and analyze screenshots, perform file operations, and generate UI/UX reports.Last updated -1ISC License