MCP OpenVision is an image analysis server powered by OpenRouter vision models via the Model Context Protocol (MCP), enabling detailed image interpretation and insights.
Analyze images: Supports Base64-encoded strings, URLs, and local file paths for image input
Customizable queries: Guide analysis with specific text instructions for contextual interpretation
System prompts: Define the model's role and behavior for specialized tasks
Model selection: Works with various OpenRouter-supported vision models (default: qwen/qwen2.5-vl-32b-instruct:free)
Advanced parameters: Control temperature, max_tokens, top_p, presence and frequency penalties
Relative path handling: Resolve image paths relative to a specified project root
Specialized use cases: Examples include product design analysis, medical scan interpretation, and chart data extraction
Provides a support option for the project through Buy Me A Coffee donations to the developer
Hosts the project repository and provides issue tracking and development collaboration tools
Leverages OpenAI's GPT-4o model through OpenRouter for vision-based image analysis tasks
Distributes the package through the Python Package Index, enabling installation via pip or uv
MCP OpenVision
Overview
MCP OpenVision is a Model Context Protocol (MCP) server that provides image analysis capabilities powered by OpenRouter vision models. It enables AI assistants to analyze images via a simple interface within the MCP ecosystem.
Installation
Installing via Smithery
To install mcp-openvision for Claude Desktop automatically via Smithery:
Using pip
Using UV (recommended)
Configuration
MCP OpenVision requires an OpenRouter API key and can be configured through environment variables:
OPENROUTER_API_KEY (required): Your OpenRouter API key
OPENROUTER_DEFAULT_MODEL (optional): The vision model to use
OpenRouter Vision Models
MCP OpenVision works with any OpenRouter model that supports vision capabilities. The default model is qwen/qwen2.5-vl-32b-instruct:free
, but you can specify any other compatible model.
Some popular vision models available through OpenRouter include:
qwen/qwen2.5-vl-32b-instruct:free
(default)anthropic/claude-3-5-sonnet
anthropic/claude-3-opus
anthropic/claude-3-sonnet
openai/gpt-4o
You can specify custom models by setting the OPENROUTER_DEFAULT_MODEL
environment variable or by passing the model
parameter directly to the image_analysis
function.
Usage
Testing with MCP Inspector
The easiest way to test MCP OpenVision is with the MCP Inspector tool:
Integration with Claude Desktop or Cursor
Edit your MCP configuration file:
Windows:
%USERPROFILE%\.cursor\mcp.json
macOS:
~/.cursor/mcp.json
or~/Library/Application Support/Claude/claude_desktop_config.json
Add the following configuration:
Running Locally for Development
Features
MCP OpenVision provides the following core tool:
image_analysis: Analyze images with vision models, supporting various parameters:
image
: Can be provided as:Base64-encoded image data
Image URL (http/https)
Local file path
query
: User instruction for the image analysis tasksystem_prompt
: Instructions that define the model's role and behavior (optional)model
: Vision model to usetemperature
: Controls randomness (0.0-1.0)max_tokens
: Maximum response length
Crafting Effective Queries
The query
parameter is crucial for getting useful results from the image analysis. A well-crafted query provides context about:
Purpose: Why you're analyzing this image
Focus areas: Specific elements or details to pay attention to
Required information: The type of information you need to extract
Format preferences: How you want the results structured
Examples of Effective Queries
Basic Query | Enhanced Query |
"Describe this image" | "Identify all retail products visible in this store shelf image and estimate their price range" |
"What's in this image?" | "Analyze this medical scan for abnormalities, focusing on the highlighted area and providing possible diagnoses" |
"Analyze this chart" | "Extract the numerical data from this bar chart showing quarterly sales, and identify the key trends from 2022-2023" |
"Read the text" | "Transcribe all visible text in this restaurant menu, preserving the item names, descriptions, and prices" |
By providing context about why you need the analysis and what specific information you're seeking, you help the model focus on relevant details and produce more valuable insights.
Example Usage
Image Input Types
The image_analysis
tool accepts several types of image inputs:
Base64-encoded strings
Image URLs - must start with http:// or https://
File paths:
Absolute paths: full paths starting with / (Unix) or drive letter (Windows)
Relative paths: paths relative to the current working directory
Relative paths with project_root: use the
project_root
parameter to specify a base directory
Using Relative Paths
When using relative file paths (like "examples/image.jpg"), you have two options:
The path must be relative to the current working directory where the server is running
Or, you can specify a
project_root
parameter:
This is particularly useful in applications where the current working directory may not be predictable or when you want to reference files using paths relative to a specific directory.
Development
Setup Development Environment
Code Formatting
This project uses Black for automatic code formatting. The formatting is enforced through GitHub Actions:
All code pushed to the repository is automatically formatted with Black
For pull requests from repository collaborators, Black formats the code and commits directly to the PR branch
For pull requests from forks, Black creates a new PR with the formatted code that can be merged into the original PR
You can also run Black locally to format your code before committing:
Run Tests
Release Process
This project uses an automated release process:
Update the version in
pyproject.toml
following Semantic Versioning principlesYou can use the helper script:
python scripts/bump_version.py [major|minor|patch]
Update the
CHANGELOG.md
with details about the new versionThe script also creates a template entry in CHANGELOG.md that you can fill in
Commit and push these changes to the
main
branchThe GitHub Actions workflow will:
Detect the version change
Automatically create a new GitHub release
Trigger the publishing workflow that publishes to PyPI
This automation helps maintain a consistent release process and ensures that every release is properly versioned and documented.
Support
If you find this project helpful, consider buying me a coffee to support ongoing development and maintenance.
License
This project is licensed under the MIT License - see the LICENSE file for details.
remote-capable server
The server can be hosted and run remotely because it primarily relies on remote services or has no dependency on the local environment.
Tools
MCP OpenVision is a Model Context Protocol (MCP) server that provides image analysis capabilities powered by OpenRouter vision models. It enables AI assistants to analyze images via a simple interface within the MCP ecosystem.
Related Resources
Related MCP Servers
- -securityAlicense-qualityA Model Context Protocol (MCP) server that lets you seamlessly use OpenAI's models right from Claude.Last updated -9068MIT License
- AsecurityAlicenseAqualityMCP Server for Eyevinn Open Source Cloud API, enabling creation of solutions based on open web services. Web services based on open source where the creator gets a share of the revenue the platform generates.Last updated -97MIT License
- -securityAlicense-qualityAn MCP server for analyzing images using OpenRouter vision models, offering capabilities like automatic image resizing, model configuration, and handling custom queries about images.Last updated -8MIT License
- AsecurityAlicenseAqualityMCP (Model Context Protocol) server that utilizes the Google Gemini Vision API to interact with YouTube videos. It allows users to get descriptions, summaries, answers to questions, and extract key moments from YouTube videos.Last updated -476MIT License