MCP Image Recognition Server
by mario-andreschak
Verified
# MCP Image Recognition Server
An MCP server that provides image recognition capabilities using Anthropic and OpenAI vision APIs. Version 0.1.2.
## Features
- Image description using Anthropic Claude Vision or OpenAI GPT-4 Vision
- Support for multiple image formats (JPEG, PNG, GIF, WebP)
- Configurable primary and fallback providers
- Base64 and file-based image input support
- Optional text extraction using Tesseract OCR
## Requirements
- Python 3.8 or higher
- Tesseract OCR (optional) - Required for text extraction feature
- Windows: Download and install from [UB-Mannheim/tesseract](https://github.com/UB-Mannheim/tesseract/wiki)
- Linux: `sudo apt-get install tesseract-ocr`
- macOS: `brew install tesseract`
## Installation
1. Clone the repository:
```bash
git clone https://github.com/mario-andreschak/mcp-image-recognition.git
cd mcp-image-recognition
```
2. Create and configure your environment file:
```bash
cp .env.example .env
# Edit .env with your API keys and preferences
```
3. Build the project:
```bash
build.bat
```
## Usage
### Running the Server
Spawn the server using python:
```bash
python -m image_recognition_server.server
```
Start the server using batch instead:
```bash
run.bat server
```
Start the server in development mode with the MCP Inspector:
```bash
run.bat debug
```
### Available Tools
1. `describe_image`
- Input: Base64-encoded image data and MIME type
- Output: Detailed description of the image
2. `describe_image_from_file`
- Input: Path to an image file
- Output: Detailed description of the image
### Environment Configuration
- `ANTHROPIC_API_KEY`: Your Anthropic API key.
- `OPENAI_API_KEY`: Your OpenAI API key.
- `VISION_PROVIDER`: Primary vision provider (`anthropic` or `openai`).
- `FALLBACK_PROVIDER`: Optional fallback provider.
- `LOG_LEVEL`: Logging level (DEBUG, INFO, WARNING, ERROR).
- `ENABLE_OCR`: Enable Tesseract OCR text extraction (`true` or `false`).
- `TESSERACT_CMD`: Optional custom path to Tesseract executable.
- `OPENAI_MODEL`: OpenAI Model (default: `gpt-4o-mini`). Can use OpenRouter format for other models (e.g., `anthropic/claude-3.5-sonnet:beta`).
- `OPENAI_BASE_URL`: Optional custom base URL for the OpenAI API. Set to `https://openrouter.ai/api/v1` for OpenRouter.
- `OPENAI_TIMEOUT`: Optional custom timeout (in seconds) for the OpenAI API.
### Using OpenRouter
OpenRouter allows you to access various models using the OpenAI API format. To use OpenRouter, follow these steps:
1. Obtain an OpenAI API key from OpenRouter.
2. Set `OPENAI_API_KEY` in your `.env` file to your OpenRouter API key.
3. Set `OPENAI_BASE_URL` to `https://openrouter.ai/api/v1`.
4. Set `OPENAI_MODEL` to the desired model using the OpenRouter format (e.g., `anthropic/claude-3.5-sonnet:beta`).
5. Set `VISION_PROVIDER` to `openai`.
### Default Models
- Anthropic: `claude-3.5-sonnet-beta`
- OpenAI: `gpt-4o-mini`
- OpenRouter: Use the `anthropic/claude-3.5-sonnet:beta` format in `OPENAI_MODEL`.
## Development
### Running Tests
Run all tests:
```bash
run.bat test
```
Run specific test suite:
```bash
run.bat test server
run.bat test anthropic
run.bat test openai
```
### Docker Support
Build the Docker image:
```bash
docker build -t mcp-image-recognition .
```
Run the container:
```bash
docker run -it --env-file .env mcp-image-recognition
```
## License
MIT License - see LICENSE file for details.
## Release History
- **0.1.2** (2025-02-20): Improved OCR error handling and added comprehensive test coverage for OCR functionality
- **0.1.1** (2025-02-19): Added Tesseract OCR support for text extraction from images (optional feature)
- **0.1.0** (2025-02-19): Initial release with Anthropic and OpenAI vision support