Integrates with Alibaba Cloud's DashScope platform to provide image recognition using Qwen-VL models (qwen-vl-max and others) through the Tongyi Qianwen service.
Provides image recognition and analysis capabilities using Google Gemini models (gemini-1.5-flash and others), accepting image URLs or Base64 data with customizable prompts.
Enables image recognition and visual question answering using OpenAI's GPT-4o and other vision models, supporting both URL and Base64 image inputs.
MCP Image Recognition Server (Python)
An MCP server implementation in Python providing image recognition capabilities using various LLM providers (Gemini, OpenAI, Qwen/Tongyi, Doubao, etc.).
Features
Image Recognition: Describe images or answer questions about them.
Multi-Model Support: Dynamically switch between Gemini, GPT-4o, Qwen-VL, Doubao, etc.
Flexible: Accepts image URLs or Base64 data.
Quick Setup (Recommended)
We provide automated scripts to set up the environment and dependencies in one click.
Linux / macOS
Windows
Clone or download this repository.
Double-click
setup.bat.
After the script finishes, simply edit the .env file with your API keys.
Installation & Usage (Manual)
If you prefer manual installation or want to use uv:
Prerequisites
Python 3.10 or higher
An API Key for your preferred model provider (Google Gemini, OpenAI, Aliyun DashScope, etc.)
Method 1: Using uv (Recommended)
uv is an extremely fast Python package manager.
1. Run directly with uv run
You don't need to manually create a virtual environment.
2. Using uvx (for ephemeral execution)
If you want to run it without cloning the repo explicitly (experimental support via git):
Method 2: Standard Python (pip)
Linux / macOS
Clone and Setup:
git clone https://github.com/glasses666/mcp-image-recognition-py.git cd mcp-image-recognition-py python3 -m venv venv source venv/bin/activate pip install -r requirements.txtConfigure:
cp .env.example .env # Edit .env and add your API keysRun:
python server.py
Windows
Clone and Setup:
git clone https://github.com/glasses666/mcp-image-recognition-py.git cd mcp-image-recognition-py python -m venv venv .\venv\Scripts\activate pip install -r requirements.txtConfigure:
copy .env.example .env # Edit .env and add your API keysRun:
python server.py
Configuration
Create a .env file in the project root based on .env.example:
1. For Google Gemini (Recommended for speed/cost)
Get an API key from Google AI Studio.
2. For Tongyi Qianwen (Qwen - Alibaba Cloud)
Get an API key from Aliyun DashScope.
3. For Doubao (Volcengine)
Get an API key from Volcengine Ark.
Agent AI Configuration (Claude Desktop, etc.)
To use this server with an MCP client (like Claude Desktop), add it to your configuration file.
Configuration File Paths
macOS:
~/Library/Application Support/Claude/claude_desktop_config.jsonWindows:
%APPDATA%\Claude\claude_desktop_config.jsonLinux:
~/.config/Claude/claude_desktop_config.json(if available)
Configuration JSON
Option A: Using
If you have uv installed, you can let it handle the environment.
Option B: Standard Python Venv Ensure you provide the absolute path to the python executable in your virtual environment.
Windows Note: For paths, use double backslashes \\ (e.g., C:\\Users\\Name\\...).
Usage Tool
recognize_image
Analyzes an image and returns a text description.
Parameters:
image(string, required): The image to analyze. Supports:HTTP/HTTPS URLs (e.g.,
https://example.com/cat.jpg)Base64 encoded strings (with or without
data:image/...;base64,prefix)
prompt(string, optional): Specific instruction. Default: "Describe this image".model(string, optional): Override the default model for this specific request.
License
MIT