hybrid server
The server is able to function both locally and remotely, depending on the configuration or use case.
Integrations
Hosts the MCP server repository, allowing users to clone the codebase for local deployment and configuration
Utilizes OpenAI GPT-4 Vision API for image analysis and detailed descriptions from both base64-encoded images and image files
MCP Image Recognition Server
An MCP server that provides image recognition capabilities using Anthropic, OpenAI, and Cloudflare Workers AI vision APIs. Version 1.2.1.
Authors
Originally this project was created by @mario-andreschak
.Thank you!
It is currently maintained by @zudsniper
.
Features
- Image description using Anthropic Claude Vision, OpenAI GPT-4 Vision, or Cloudflare Workers AI llava-1.5-7b-hf
- Easy integration with Claude Desktop, Cursor, and other MCP-compatible clients
- Support for Docker deployment
- Support for uvx installation
- Support for multiple image formats (JPEG, PNG, GIF, WebP)
- Configurable primary and fallback providers
- Base64 and file-based image input support
- Optional text extraction using Tesseract OCR
Requirements
- Python 3.8 or higher
- Tesseract OCR (optional) - Required for text extraction feature
- Windows: Download and install from UB-Mannheim/tesseract
- Linux:
sudo apt-get install tesseract-ocr
- macOS:
brew install tesseract
Installation
Option 1: Using uvx (Recommended for Claude Desktop and Cursor)
- Install uv package manager:
- Install the package with uvx:
- Create and configure your environment file as described in the Configuration section
Option 2: Using Docker
Option 3: From Source
- Clone the repository:
- Create and configure your environment file:
- Build the project:
Integration
Claude Desktop Integration
- Go to Claude > Settings > Developer > Edit Config > claude_desktop_config.json
- Add configuration with inline environment variables:
Cursor Integration
Go to Cursor Settings > MCP and paste with env variables:
Docker Integration
Option 1: Using DockerHub Image
Add this to your Claude Desktop config with inline environment:
For Cloudflare configuration:
Usage
Running the Server Directly
If installed with pip/uvx:
From source directory:
Using Docker:
Start in development mode with the MCP Inspector:
Available Tools
describe_image
- Purpose: Analyze images directly uploaded to chat
- Input: Base64-encoded image data
- Output: Detailed description of the image
- Best for: Images uploaded directly to Claude, Cursor, or other chat interfaces
describe_image_from_file
- Purpose: Process local image files from filesystem
- Input: Path to an image file
- Output: Detailed description of the image
- Best for: Local development with filesystem access
- Note: When running in Docker, requires volume mapping (see Docker File Access section)
describe_image_from_url
- Purpose: Analyze images from web URLs without downloading manually
- Input: URL of a publicly accessible image
- Output: Detailed description of the image
- Best for: Web images, screenshots, or anything with a public URL
- Note: Uses browser-like headers to avoid rate limiting
Environment Configuration
ANTHROPIC_API_KEY
: Your Anthropic API key.OPENAI_API_KEY
: Your OpenAI API key.CLOUDFLARE_API_KEY
: Your Cloudflare API key.CLOUDFLARE_ACCOUNT_ID
: Your Cloudflare Account ID.VISION_PROVIDER
: Primary vision provider (anthropic
,openai
, orcloudflare
).FALLBACK_PROVIDER
: Optional fallback provider.LOG_LEVEL
: Logging level (DEBUG, INFO, WARNING, ERROR).ENABLE_OCR
: Enable Tesseract OCR text extraction (true
orfalse
).TESSERACT_CMD
: Optional custom path to Tesseract executable.OPENAI_MODEL
: OpenAI Model (default:gpt-4o-mini
). Can use OpenRouter format for other models (e.g.,anthropic/claude-3.5-sonnet:beta
).OPENAI_BASE_URL
: Optional custom base URL for the OpenAI API. Set tohttps://openrouter.ai/api/v1
for OpenRouter.OPENAI_TIMEOUT
: Optional custom timeout (in seconds) for the OpenAI API.CLOUDFLARE_MODEL
: Cloudflare Workers AI model (default:@cf/llava-hf/llava-1.5-7b-hf
).CLOUDFLARE_MAX_TOKENS
: Maximum number of tokens to generate (default:512
).CLOUDFLARE_TIMEOUT
: Timeout for Cloudflare API requests in seconds (default:60
).
Using OpenRouter
OpenRouter allows you to access various models using the OpenAI API format. To use OpenRouter, follow these steps:
- Obtain an OpenAI API key from OpenRouter.
- Set
OPENAI_API_KEY
in your.env
file to your OpenRouter API key. - Set
OPENAI_BASE_URL
tohttps://openrouter.ai/api/v1
. - Set
OPENAI_MODEL
to the desired model using the OpenRouter format (e.g.,anthropic/claude-3.5-sonnet:beta
). - Set
VISION_PROVIDER
toopenai
.
Default Models
- Anthropic:
claude-3.5-sonnet-beta
- OpenAI:
gpt-4o-mini
- Cloudflare Workers AI:
@cf/llava-hf/llava-1.5-7b-hf
- OpenRouter: Use the
anthropic/claude-3.5-sonnet:beta
format inOPENAI_MODEL
.
Development
Development Setup Guide
Setting Up Development Environment
- Clone the repository:
- Setup with uv (recommended):
Alternative setup with pip:
Copy
- Configure environment:
VS Code / DevContainer Development
- Install VS Code with the Remote Containers extension
- Open the project folder in VS Code
- Click "Reopen in Container" when prompted
- The devcontainer will build and open with all dependencies installed
Using Development Container with Claude Desktop
- Pass environment file to docker compose:
- Add this to your Claude Desktop config:
Testing Your Changes Locally
- Run the MCP server in development mode:
- The Inspector provides a web interface (usually at http://localhost:3000) where you can:
- Send requests to your tools
- View request/response logs
- Debug issues with your implementation
- Test specific tools:
- For
describe_image
: Provide a base64-encoded image - For
describe_image_from_file
: Provide a path to a local image file - For
describe_image_from_url
: Provide a URL to an image
- For
Integrating with Claude Desktop for Testing
- Temporarily modify your Claude Desktop configuration to use your development version:
- Restart Claude Desktop to apply the changes
- Test by uploading images or providing image URLs in your conversations
Running Tests
Run all tests:
Run specific test suite:
Docker Support
Build the Docker image:
Run the container:
Docker File Access Limitations
When running the MCP server in Docker, the describe_image_from_file
tool can only access files inside the container. By default, the container has no access to files on your host system. To enable access to local files, you must explicitly map directories when configuring the MCP server.
Important Note: When using Claude Desktop, Cursor, or other platforms where images are uploaded to chats, those images are stored on Anthropic's servers and not directly accessible to the MCP server via a filesystem path. In these cases, you should:
- Use the
describe_image
tool (which works with base64-encoded images) for images uploaded directly to the chat - Use the new
describe_image_from_url
tool for images hosted online - For local files, ensure the directory is properly mapped to the Docker container
Mapping Local Directories to Docker
To give the Docker container access to specific folders on your system, modify your MCP server configuration to include volume mapping:
For example, to map your Downloads folder:
- Windows:
-v "C:\\Users\\YourName\\Downloads:/app/images"
- macOS/Linux:
-v "/Users/YourName/Downloads:/app/images"
Then access files using the container path: /app/images/your_image.jpg
License
MIT License - see LICENSE file for details.
Using Cloudflare Workers AI
To use Cloudflare Workers AI for image recognition:
- Log in to the Cloudflare dashboard and select your account.
- Go to AI > Workers AI.
- Select Use REST API and create an API token with Workers AI permissions.
- Set the following in your
.env
file:CLOUDFLARE_API_KEY
: Your Cloudflare API tokenCLOUDFLARE_ACCOUNT_ID
: Your Cloudflare account IDVISION_PROVIDER
: Set tocloudflare
CLOUDFLARE_MODEL
: Optional, defaults to@cf/llava-hf/llava-1.5-7b-hf
Using with AI Assistants
Once configured, your AI assistant (Claude, for example) can analyze images by:
- Upload an image directly in chat
- The assistant will automatically use the MCP server to analyze the image
- The assistant will describe the image in detail based on the vision API output
Example prompt after uploading an image:
You can also customize the prompt for specific needs:
or
Release History
- 1.2.1 (2025-03-28): Reorganized documentation and improved devcontainer workflow
- 1.2.0 (2025-03-28): Fixed URL image fetching with httpx & browser headers, added devcontainer support
- 1.1.0 (2025-03-28): Enhanced tool descriptions for better selection, updated OpenAI SDK to latest version
- 1.0.1 (2025-03-28): Added URL-based image recognition, improved Docker documentation, and fixed filesystem limitations
- 1.0.0 (2025-03-28): Added Cloudflare Workers AI support with llava-1.5-7b-hf model, Docker support, and uvx compatibility
- 0.1.2 (2025-02-20): Improved OCR error handling and added comprehensive test coverage for OCR functionality
- 0.1.1 (2025-02-19): Added Tesseract OCR support for text extraction from images (optional feature)
- 0.1.0 (2025-02-19): Initial release with Anthropic and OpenAI vision support
License
MIT License - see LICENSE file for details.
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
Releasing New Versions
To release a new version:
- Update version in
pyproject.toml
andsetup.py
- Push changes to the
release
branch - GitHub Actions will automatically:
- Run tests
- Build and push Docker images
- Publish to PyPI
- Create a GitHub Release
Required repository secrets for CI/CD:
DOCKERHUB_USERNAME
- Docker Hub usernameDOCKERHUB_TOKEN
- Docker Hub access tokenPYPI_API_TOKEN
- PyPI API token
You must be authenticated.
Provides image recognition capabilities using Anthropic Claude Vision and OpenAI GPT-4 Vision APIs, supporting multiple image formats and offering optional text extraction via Tesseract OCR.
- Authors
- Features
- Requirements
- Installation
- Integration
- Usage
- Development
- License
- Using with AI Assistants
- Release History
- License
- Contributing