open-vision-mcp
Enables local image analysis using Ollama's vision models (e.g., llama3.2-vision), allowing AI agents to process images without external API dependencies.
Provides image analysis capabilities using OpenAI's vision models such as GPT-4o, enabling AI agents to analyze and understand images through a single MCP server.
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@open-vision-mcpanalyze this image: https://example.com/diagram.png"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
open-vision-mcp
Provider-neutral Vision MCP Server — image analysis via 10 inference providers through a single MCP server.
⚠️ Validation status: This server is engineering-validated (163/163 unit/integration tests pass) but live provider validation is pending. No provider has been tested end-to-end with real API credentials yet. Cerebras vision support is specifically unverified. See the Validation Status section below.
What It Does
A Model Context Protocol (MCP) server that gives AI agents the ability to see and understand images using any of 9 supported inference providers. All providers are accessed through a single code path with per-provider configuration.
Perfect for screenshots, photos, diagrams, webpage analysis, and mobile app UI review.
Related MCP server: FGCLIP-MCP
Supported Providers
Provider |
| Default | Suggested vision model |
| Notes |
OpenRouter |
|
|
| No (default: | Sends |
OpenAI |
|
|
| No | — |
Together |
|
| (check Together's model catalog) | Yes | Multi-model aggregator |
DeepInfra |
|
| (check DeepInfra's catalog) | Yes | Multi-model aggregator |
Fireworks |
|
| (check Fireworks' catalog) | Yes | Multi-model aggregator |
Groq |
|
|
| No | Fast inference |
Chutes |
|
| (check Chutes' | Yes | Per-model capability preflight |
Cerebras |
|
|
| No | ⚠️ Vision support unverified |
Azure OpenAI |
| (user-supplied deployment URL) | (deployment-configured) | No (ignored) | Requires |
Ollama |
|
|
| No | Local (default) or Cloud ( |
Quick Start
Prerequisites
Node.js 18+
An API key for at least one provider above
Option 1: Use with npx (recommended)
# Set your provider, API key, and model
export PROVIDER=openrouter
export API_KEY=sk-or-v1-your-api-key-here
export MODEL=google/gemini-2.0-flash-exp:free
# Run the server
npx open-vision-mcpOption 2: Install globally
npm install -g open-vision-mcp
open-vision-mcpOption 3: Clone and build
git clone https://github.com/abyssbugg/open-vision-mcp.git
cd open-vision-mcp
npm install
npm run build
node dist/index.jsConfiguration
All configuration is via environment variables.
Environment Variables
Variable | Required? | Default | Purpose |
| No |
| Provider discriminator (one of the 9 above) |
| Yes | — | API key for the selected provider |
| Depends | Per-provider default | Model id (required for |
| No (yes for | Per-provider default | Full-prefix base URL (must include |
| No | Per-provider default | JSON object string of extra HTTP headers (e.g., OpenRouter's |
| No |
| Log level: |
| No |
| Maximum image size in bytes |
| No |
| (Reserved — not yet implemented) |
| No |
| (Reserved — not used by stdio transport) |
Legacy Environment Variables (backwards compatible)
If the new variables above are unset, the server falls back to the legacy OpenRouter variables:
Legacy variable | Maps to |
|
|
|
|
|
|
Resolution precedence: new variable > legacy variable > per-provider default.
Existing OpenRouter users with only OPENROUTER_API_KEY set continue to work with zero config changes. The server defaults to PROVIDER=openrouter and lifts the legacy variables into the new fields.
Azure OpenAI Configuration
Azure is the only provider that requires BASE_URL (no per-provider default). The BASE_URL must be the full deployment URL including the api-version query parameter:
export PROVIDER=azure
export API_KEY=your-azure-resource-key
export BASE_URL=https://your-resource.openai.azure.com/openai/deployments/your-deployment?api-version=2024-02-15-previewFor Azure, MODEL is ignored — the deployment name is in BASE_URL. Azure uses the api-key header (not Authorization: Bearer). Azure has no /models endpoint; testConnection and validateModel return true without making HTTP calls (the first analyze_image call is the real health check).
MCP Client Configuration
Claude Code
Add to ~/.claude.json:
{
"mcp": {
"servers": {
"vision": {
"command": "npx",
"args": ["open-vision-mcp"],
"env": {
"PROVIDER": "openrouter",
"API_KEY": "sk-or-v1-your-key-here",
"MODEL": "google/gemini-2.0-flash-exp:free"
}
}
}
}
}Claude Desktop
Add to ~/Library/Application Support/Claude/claude_desktop_config.json:
{
"mcpServers": {
"vision": {
"command": "npx",
"args": ["open-vision-mcp"],
"env": {
"PROVIDER": "openai",
"API_KEY": "sk-your-key-here",
"MODEL": "gpt-4o"
}
}
}
}Other MCP Clients
Cursor:
~/.cursor/mcp.jsonCline:
~/.cline/mcp.jsonWindsurf: MCP settings file
Check your agent's MCP documentation
Switching Providers
Change only the env vars — no code changes:
# Switch from OpenRouter to Groq
export PROVIDER=groq
export API_KEY=gsk-your-groq-key
export MODEL=llama-3.2-90b-vision-preview
npx open-vision-mcpTools
The server exposes 3 MCP tools. Tool names and schemas are identical for all providers.
analyze_image
General image analysis. Supports base64, file paths, and URLs.
Parameters:
type—base64|file|urldata— image data (base64 string, file path, or URL)mimeType— MIME type (required forbase64)prompt— custom analysis prompt (optional)format—text|json(default:text)maxTokens— max response tokens (default: 4000)temperature— sampling temperature 0–2 (default: 0.1)
analyze_webpage_screenshot
Webpage screenshot specialist. Extracts content, layout, navigation, forms, and accessibility info.
Additional parameters:
focusArea—layout|content|navigation|forms|interactive|accessibilityincludeAccessibility— include accessibility analysis (default:true)format—text|json(default:json)
analyze_mobile_app_screenshot
Mobile app screenshot specialist. UI design, UX, platform conventions, accessibility.
Additional parameters:
platform—ios|android|auto-detect(default:auto-detect)focusArea—ui-design|user-experience|navigation|accessibility|performance|onboardingincludeUXHeuristics— include Nielsen's 10 heuristics (default:true)format—text|json(default:json)
Validation Status
Engineering Validation: COMPLETE
163/163 unit + integration tests pass (JSON reporter;
npm test)npm run buildexit 0 — TypeScript strict mode, zero errorsnpm run lintexit 0 — ESLint, zero errorsSentinel 8/8 — black-box MCP protocol test over stdio
Phase 2B.5 Stage 1: 10/10 mandatory gates passed — keyless validation (configuration, request construction, auth headers, error handling, image processing, timeouts, capabilities, base URLs, logging)
MCP contract byte-identical to baseline — tool names, schemas, and output format unchanged from the original upstream
Backwards compatibility verified — legacy
OPENROUTER_API_KEYenv var fallback works end-to-end
Operational Validation: PENDING
Live provider validation (Stage 2): paused pending valid API credentials. No provider has been tested end-to-end through
analyze_imagewith a real key. Stage 2 may be executed at any future time when credentials become available.Cerebras vision support: UNVERIFIED. The
llama-4-scout-17b-16e-instructmodel is configured as the default, but whether it accepts multimodal (image) input has not been confirmed. If it doesn't support vision, the user gets a clear provider-aware error at runtime.Provider response shapes: unit tests mock axios and assert the request/response shape. Live APIs have not been confirmed to match the mocked shapes, though all providers are OpenAI-compatible per their official documentation.
What This Means for Users
The server is architecturally sound and thoroughly unit-tested. The provider abstraction, configuration resolution, request construction, error handling, and MCP protocol are all verified. However, no live API call has been made against any provider. Users should be aware that:
The request body shape is correct per provider documentation (verified in Phase 2A research), but not confirmed against a live 200 response.
Cerebras may not support vision input — use a different provider if vision is critical.
If a provider's API has changed since the Phase 2A research (conducted June 2026), the server may need a config update.
Supported Image Formats
Format | Extension | MIME Type | Status |
JPEG |
|
| ✅ |
PNG |
|
| ✅ |
WebP |
|
| ✅ |
GIF |
|
| ✅ |
Max size | — | — | 10MB (configurable via |
MIME type detection is signature-based (no native dependencies).
Development
# Install dependencies
npm install
# Build
npm run build
# Run in development mode
npm run dev
# Run tests
npm test # all tests (JSON reporter for counts)
npx vitest run test/unit # unit tests only
npx vitest run test/integration # integration tests only
# Lint and format
npm run lint
npm run format
# Clean
npm run cleanTroubleshooting
"API_KEY environment variable is required"
Set API_KEY (or the legacy OPENROUTER_API_KEY) to your provider's API key.
"MODEL environment variable is required for provider 'together'"
The together, deepinfra, fireworks, and chutes providers are multi-model aggregators with no default model. Set MODEL to a vision-capable model id from the provider's catalog.
"Unknown PROVIDER 'xyz'"
The PROVIDER value must be one of: openrouter, openai, together, deepinfra, fireworks, groq, chutes, cerebras, azure.
"BASE_URL is required for provider 'azure'"
Azure requires BASE_URL to be the full deployment URL including ?api-version=. There is no per-provider default.
" API Error: ..."
The error message includes the provider id (e.g., openai API Error: Invalid API key). This confirms the provider-aware error handling is working. Check your API key and model id.
Ollama: connection refused (local)
Ensure Ollama is running locally (ollama serve or ollama pull <model>). The default BASE_URL is http://localhost:11434. For Ollama Cloud, set BASE_URL=https://api.ollama.com and API_KEY=<cloud-token>.
Attribution
This project is a fork of JonathanJude/openrouter-image-mcp (MIT license), refactored to be provider-neutral with support for 10 inference providers.
License
MIT — see LICENSE.
Maintenance
Latest Blog Posts
- Why MCP Servers Need Execution Sandboxing (And Why Your Current Stack Isn't Enough)By Om-Shree-0709 on .Agentic AiPrompt InjectionWebAssembly
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/abyssbugg/open-vision-mcp'
If you have feedback or need assistance with the MCP directory API, please join our Discord server