vision-mcp
The vision-mcp server enables text-only AI coding CLIs to "see" and reason about images by proxying them to Groq's vision model and returning text descriptions. It offers the following tools (each available as both a file path-based and clipboard-based variant):
Analyze images (
analyze_image/analyze_image_from_clipboard): Get a general description of any image or ask a specific question about it.OCR (
extract_text_from_screenshot): Extract text verbatim from screenshots of code editors, terminals, or documents.Error diagnosis (
diagnose_error_screenshot): Analyze error messages, stack traces, or crash dialogs and suggest causes and fixes.Technical diagram understanding (
understand_technical_diagram): Interpret architecture diagrams, flowcharts, UML, or ER diagrams and explain their components and relationships.Data visualization analysis (
analyze_data_visualization): Extract key values, trends, and insights from charts, graphs, or dashboards.UI description (
describe_ui): Describe a UI screenshot's layout, components, and style, with an option to generate a JSX/HTML+CSS sketch.
Supports common image formats (.png, .jpg, .jpeg, .gif, .webp, .bmp), file-based images up to ~3MB, and clipboard images up to 25MB.
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@vision-mcpdiagnose this error screenshot: ./error.png"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
vision-mcp
An MCP (Model Context Protocol) stdio server that gives text-only AI coding CLIs — Claude Code, Cline, OpenCode, Cursor, or anything else that speaks MCP — the ability to "see" images. It proxies image files to Groq's free vision model (Llama 4 Scout) and returns a text description, so the calling assistant can reason about screenshots, diagrams, error dialogs, and UI mockups even when its own model has no multimodal capability.
Security & privacy warning
Images you pass to these tools are sent to Groq's API (a third-party service) for processing. Do not use this tool on images containing secrets, credentials, API keys, or other sensitive personal data unless you accept that risk. This server does not add its own sandboxing beyond validating that the target file is a genuine image (see Limits below) — it runs with the same filesystem permissions as the process that launches it (your AI CLI), which already has full filesystem access on your machine. The server never logs image bytes, base64 payloads, or your API key. The *_from_clipboard tools read whatever image is currently on the OS clipboard and never write it to disk — the bytes stay in memory only, on their way to the same Groq API call.
Related MCP server: simple-vision-mcp
Prerequisites
Node.js 20 or later
A free Groq API key from console.groq.com/keys
Clipboard tool prerequisites
Only needed if you want to use the *_from_clipboard tools — the path-based tools (analyze_image, etc.) need none of this.
OS | Requirement | Install |
macOS |
|
|
Linux (Wayland) |
|
|
Linux (X11) |
|
|
Windows | none — uses built-in PowerShell | — |
Install & build
npm install
npm run buildConfigure your MCP client
Add an entry to your MCP client's server config (exact file location varies by client — see your CLI's docs):
{
"mcpServers": {
"vision-mcp": {
"command": "node",
"args": ["/absolute/path/to/vision-mcp/dist/index.js"],
"env": {
"GROQ_API_KEY": "gsk_your_key_here"
}
}
}
}Restart your AI CLI after adding this config.
OpenCode
OpenCode uses a different config schema (mcp instead of mcpServers, and environment instead of env). Add this to opencode.json (project root) or ~/.config/opencode/opencode.json (global):
{
"$schema": "https://opencode.ai/config.json",
"mcp": {
"vision-mcp": {
"type": "local",
"command": ["node", "/absolute/path/to/vision-mcp/dist/index.js"],
"enabled": true,
"environment": {
"GROQ_API_KEY": "gsk_your_key_here"
}
}
}
}Non-multimodal models in OpenCode (e.g. Build/Big Pickle) will say they "can't read images" when you paste a screenshot, even with vision-mcp installed — they don't know to reach for the tool on their own. Add a rule so the agent does this automatically: append to your project's AGENTS.md (or ~/.config/opencode/AGENTS.md for a global rule covering every project):
## Images
The current model cannot read images natively. Whenever an image is pasted,
attached, or referenced (including clipboard screenshots), immediately use
the matching vision-mcp tool (e.g. `analyze_image_from_clipboard`,
`diagnose_error_screenshot_from_clipboard`) instead of saying you can't see
images. Do not ask the user for permission first.Tools
Tool | Purpose | Params |
| General description of an image, or answer a specific question about it |
|
| OCR — verbatim text extraction from code/terminal/document screenshots |
|
| Analyze a screenshot of an error/stack trace/crash dialog → likely cause & fix |
|
| Read architecture diagrams, flowcharts, UML, or ER diagrams |
|
| Read charts/dashboards, extract key values and insights |
|
| Describe a UI screenshot's layout/components/style; ask for code to get a JSX/HTML+CSS sketch |
|
| Same as |
|
| Same as |
|
| Same as |
|
| Same as |
|
| Same as |
|
| Same as |
|
Example: ask your AI CLI "use diagnose_error_screenshot on ./error.png, I was running npm test" and it will call the tool with image_path: "./error.png" and context: "running npm test". Or, after copying a screenshot to the clipboard: "use diagnose_error_screenshot_from_clipboard, I was running npm test".
image_path may be absolute or relative — relative paths are resolved against the server process's working directory (normally your project root, as launched by your MCP client).
Limits
Allowed formats:
.png .jpg .jpeg .gif .webp .bmp, verified by file content (magic bytes), not just the extension. This applies equally to clipboard images.Max local file size: 20MB (path-based tools only, checked via
stat()before reading).Clipboard images can't be
stat()-ed before reading, so instead they're bounded by a 25MB raw ceiling enforced on the underlying OS command's output.Effective size for the vision model: Groq's API only allows ~4MB for inline base64 images (its 20MB limit applies only to hosted image URLs, which this server does not use). In practice, keep images under ~3MB raw so they encode under that 4MB base64 cap — larger images (whether from a file or the clipboard) are rejected before any API call is made, with a clear message.
Troubleshooting
"GROQ_API_KEY environment variable is not set" — add
GROQ_API_KEYto theenvblock in your MCP client config and restart the client."IMAGE_TOO_LARGE_FOR_MODEL" — resize or compress the image below ~3MB and try again.
"MAGIC_BYTE_MISMATCH" — the file's actual content doesn't match a supported image format (e.g. a non-image file with an image-like extension); this is a deliberate safety check, not a bug.
"UNSUPPORTED_EXTENSION" — convert the file to one of the allowed formats.
"NO_TOOL" (clipboard tools only) — the required OS clipboard utility isn't installed; see Clipboard tool prerequisites.
"NO_IMAGE" (clipboard tools only) — the clipboard doesn't currently hold an image; copy a screenshot/image first and try again.
"CLIPBOARD_IMAGE_TOO_LARGE" (clipboard tools only) — the clipboard image exceeds the 25MB raw ceiling; copy a smaller image.
"NO_DISPLAY" (Linux clipboard tools only) — no graphical session was detected (
WAYLAND_DISPLAY/DISPLAYboth unset); clipboard access needs a desktop session."UNSUPPORTED_PLATFORM" (clipboard tools only) — the clipboard tools only support macOS, Linux, and Windows.
Platform coverage notes
Clipboard support is implemented for macOS (pngpaste), Linux (wl-paste for Wayland, xclip for X11, with automatic fallback between them), and Windows (built-in PowerShell). It has been live-tested end-to-end on Linux/Wayland; the X11 and macOS/Windows code paths are written defensively per each tool's documented behavior but have not been exercised on real X11/macOS/Windows machines yet — treat those three as needing a follow-up manual smoke test before relying on them in production.
Maintenance
Resources
Unclaimed servers have limited discoverability.
Looking for Admin?
If you are the server author, to access and configure the admin panel.
Latest Blog Posts
- Your AI Chatbot Just Exposed Your CEO's Salary to an InternBy Om-Shree-0709 on .Agent IdentityMCP SecurityOAuth Delegation
- Why MCP Servers Need Execution Sandboxing (And Why Your Current Stack Isn't Enough)By Om-Shree-0709 on .Agentic AiPrompt InjectionWebAssembly
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/konan-1947/vision-mcp'
If you have feedback or need assistance with the MCP directory API, please join our Discord server