What can you do with this server?

The vision-mcp server enables text-only AI coding CLIs to "see" and reason about images by proxying them to Groq's vision model and returning text descriptions. It offers the following tools (each available as both a file path-based and clipboard-based variant): * Analyze images (analyze_image / analyze_image_from_clipboard): Get a general description of any image or ask a specific question about it. * OCR (extract_text_from_screenshot): Extract text verbatim from screenshots of code editors, terminals, or documents. * Error diagnosis (diagnose_error_screenshot): Analyze error messages, stack traces, or crash dialogs and suggest causes and fixes. * Technical diagram understanding (understand_technical_diagram): Interpret architecture diagrams, flowcharts, UML, or ER diagrams and explain their components and relationships. * Data visualization analysis (analyze_data_visualization): Extract key values, trends, and insights from charts, graphs, or dashboards. * UI description (describe_ui): Describe a UI screenshot's layout, components, and style, with an option to generate a JSX/HTML+CSS sketch. Supports common image formats (.png, .jpg, .jpeg, .gif, .webp, .bmp), file-based images up to ~3MB, and clipboard images up to 25MB.

How do I use vision-mcp?

1. Click on "Install Server". 2. Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state. 3. In the chat, type @ followed by the MCP server name and your instructions, e.g., "@vision-mcp diagnose this error screenshot: ./error.png" That's it! The server will respond to your query, and you can continue using it as needed. Here is a step-by-step guide with screenshots.

vision-mcp

by konan-1947

Overview Schema Related Servers Score Discussions

TypeScript

Local

vision-mcp

An MCP (Model Context Protocol) stdio server that gives text-only AI coding CLIs — Claude Code, Cline, OpenCode, Cursor, or anything else that speaks MCP — the ability to "see" images. It proxies image files to Groq's free vision model (Llama 4 Scout) and returns a text description, so the calling assistant can reason about screenshots, diagrams, error dialogs, and UI mockups even when its own model has no multimodal capability.

Security & privacy warning

Images you pass to these tools are sent to Groq's API (a third-party service) for processing. Do not use this tool on images containing secrets, credentials, API keys, or other sensitive personal data unless you accept that risk. This server does not add its own sandboxing beyond validating that the target file is a genuine image (see Limits below) — it runs with the same filesystem permissions as the process that launches it (your AI CLI), which already has full filesystem access on your machine. The server never logs image bytes, base64 payloads, or your API key. The *_from_clipboard tools read whatever image is currently on the OS clipboard and never write it to disk — the bytes stay in memory only, on their way to the same Groq API call.

Related MCP server: simple-vision-mcp

Prerequisites

Node.js 20 or later
A free Groq API key from console.groq.com/keys

Clipboard tool prerequisites

Only needed if you want to use the *_from_clipboard tools — the path-based tools (analyze_image, etc.) need none of this.

OS	Requirement	Install
macOS	`pngpaste`	`brew install pngpaste`
Linux (Wayland)	`wl-paste` (from `wl-clipboard`)	`sudo apt install wl-clipboard` / `sudo dnf install wl-clipboard` / `sudo pacman -S wl-clipboard`
Linux (X11)	`xclip`	`sudo apt install xclip` / `sudo dnf install xclip` / `sudo pacman -S xclip`
Windows	none — uses built-in PowerShell	—

Install & build

npm install
npm run build

Configure your MCP client

Add an entry to your MCP client's server config (exact file location varies by client — see your CLI's docs):

{
  "mcpServers": {
    "vision-mcp": {
      "command": "node",
      "args": ["/absolute/path/to/vision-mcp/dist/index.js"],
      "env": {
        "GROQ_API_KEY": "gsk_your_key_here"
      }
    }
  }
}

Restart your AI CLI after adding this config.

OpenCode

OpenCode uses a different config schema (mcp instead of mcpServers, and environment instead of env). Add this to opencode.json (project root) or ~/.config/opencode/opencode.json (global):

{
  "$schema": "https://opencode.ai/config.json",
  "mcp": {
    "vision-mcp": {
      "type": "local",
      "command": ["node", "/absolute/path/to/vision-mcp/dist/index.js"],
      "enabled": true,
      "environment": {
        "GROQ_API_KEY": "gsk_your_key_here"
      }
    }
  }
}

Non-multimodal models in OpenCode (e.g. Build/Big Pickle) will say they "can't read images" when you paste a screenshot, even with vision-mcp installed — they don't know to reach for the tool on their own. Add a rule so the agent does this automatically: append to your project's AGENTS.md (or ~/.config/opencode/AGENTS.md for a global rule covering every project):

## Images

The current model cannot read images natively. Whenever an image is pasted,
attached, or referenced (including clipboard screenshots), immediately use
the matching vision-mcp tool (e.g. `analyze_image_from_clipboard`,
`diagnose_error_screenshot_from_clipboard`) instead of saying you can't see
images. Do not ask the user for permission first.

Tools

Tool	Purpose	Params
`analyze_image`	General description of an image, or answer a specific question about it	`image_path`, `question?`
`extract_text_from_screenshot`	OCR — verbatim text extraction from code/terminal/document screenshots	`image_path`, `context?`
`diagnose_error_screenshot`	Analyze a screenshot of an error/stack trace/crash dialog → likely cause & fix	`image_path`, `context?`
`understand_technical_diagram`	Read architecture diagrams, flowcharts, UML, or ER diagrams	`image_path`, `question?`
`analyze_data_visualization`	Read charts/dashboards, extract key values and insights	`image_path`, `question?`
`describe_ui`	Describe a UI screenshot's layout/components/style; ask for code to get a JSX/HTML+CSS sketch	`image_path`, `question?`
`analyze_image_from_clipboard`	Same as `analyze_image`, reading from the OS clipboard instead of a file	`question?`
`extract_text_from_screenshot_from_clipboard`	Same as `extract_text_from_screenshot`, reading from the clipboard	`context?`
`diagnose_error_screenshot_from_clipboard`	Same as `diagnose_error_screenshot`, reading from the clipboard	`context?`
`understand_technical_diagram_from_clipboard`	Same as `understand_technical_diagram`, reading from the clipboard	`question?`
`analyze_data_visualization_from_clipboard`	Same as `analyze_data_visualization`, reading from the clipboard	`question?`
`describe_ui_from_clipboard`	Same as `describe_ui`, reading from the clipboard	`question?`

Example: ask your AI CLI "use diagnose_error_screenshot on ./error.png, I was running npm test" and it will call the tool with image_path: "./error.png" and context: "running npm test". Or, after copying a screenshot to the clipboard: "use diagnose_error_screenshot_from_clipboard, I was running npm test".

image_path may be absolute or relative — relative paths are resolved against the server process's working directory (normally your project root, as launched by your MCP client).

Limits

Allowed formats: .png .jpg .jpeg .gif .webp .bmp, verified by file content (magic bytes), not just the extension. This applies equally to clipboard images.
Max local file size: 20MB (path-based tools only, checked via stat() before reading).
Clipboard images can't be stat()-ed before reading, so instead they're bounded by a 25MB raw ceiling enforced on the underlying OS command's output.
Effective size for the vision model: Groq's API only allows ~4MB for inline base64 images (its 20MB limit applies only to hosted image URLs, which this server does not use). In practice, keep images under ~3MB raw so they encode under that 4MB base64 cap — larger images (whether from a file or the clipboard) are rejected before any API call is made, with a clear message.

Troubleshooting

"GROQ_API_KEY environment variable is not set" — add GROQ_API_KEY to the env block in your MCP client config and restart the client.
"IMAGE_TOO_LARGE_FOR_MODEL" — resize or compress the image below ~3MB and try again.
"MAGIC_BYTE_MISMATCH" — the file's actual content doesn't match a supported image format (e.g. a non-image file with an image-like extension); this is a deliberate safety check, not a bug.
"UNSUPPORTED_EXTENSION" — convert the file to one of the allowed formats.
"NO_TOOL" (clipboard tools only) — the required OS clipboard utility isn't installed; see Clipboard tool prerequisites.
"NO_IMAGE" (clipboard tools only) — the clipboard doesn't currently hold an image; copy a screenshot/image first and try again.
"CLIPBOARD_IMAGE_TOO_LARGE" (clipboard tools only) — the clipboard image exceeds the 25MB raw ceiling; copy a smaller image.
"NO_DISPLAY" (Linux clipboard tools only) — no graphical session was detected (WAYLAND_DISPLAY/DISPLAY both unset); clipboard access needs a desktop session.
"UNSUPPORTED_PLATFORM" (clipboard tools only) — the clipboard tools only support macOS, Linux, and Windows.

Platform coverage notes

Clipboard support is implemented for macOS (pngpaste), Linux (wl-paste for Wayland, xclip for X11, with automatic fallback between them), and Windows (built-in PowerShell). It has been live-tested end-to-end on Linux/Wayland; the X11 and macOS/Windows code paths are written defensively per each tool's documented behavior but have not been exercised on real X11/macOS/Windows machines yet — treat those three as needing a follow-up manual smoke test before relying on them in production.

Install Server

license - permissive license

quality

maintenance

How are these scores calculated?

Maintenance

–Maintainers

–Response time

–Release cycle

–Releases (12mo)

Commit activity

Resources

Need Help?

Related Servers

Unclaimed servers have limited discoverability.

Looking for Admin?

If you are the server author, to access and configure the admin panel.

Tools

Latest Blog Posts

Who's Calling? MCP Hosts Are an Identity Blind Spot (And the Spec Knows It)
By Om-Shree-0709 on July 25, 2026.
mcp
Agent Identity
OAuth 2.1
Your AI Chatbot Just Exposed Your CEO's Salary to an Intern
By Om-Shree-0709 on July 2, 2026.
Agent Identity
MCP Security
OAuth Delegation
Why MCP Servers Need Execution Sandboxing (And Why Your Current Stack Isn't Enough)
By Om-Shree-0709 on June 30, 2026.
Agentic Ai
Prompt Injection
WebAssembly

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/konan-1947/vision-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server