Skip to main content
Glama

vision-mcp

An MCP (Model Context Protocol) stdio server that gives text-only AI coding CLIs — Claude Code, Cline, OpenCode, Cursor, or anything else that speaks MCP — the ability to "see" images. It proxies image files to Groq's free vision model (Llama 4 Scout) and returns a text description, so the calling assistant can reason about screenshots, diagrams, error dialogs, and UI mockups even when its own model has no multimodal capability.

Security & privacy warning

Images you pass to these tools are sent to Groq's API (a third-party service) for processing. Do not use this tool on images containing secrets, credentials, API keys, or other sensitive personal data unless you accept that risk. This server does not add its own sandboxing beyond validating that the target file is a genuine image (see Limits below) — it runs with the same filesystem permissions as the process that launches it (your AI CLI), which already has full filesystem access on your machine. The server never logs image bytes, base64 payloads, or your API key. The *_from_clipboard tools read whatever image is currently on the OS clipboard and never write it to disk — the bytes stay in memory only, on their way to the same Groq API call.

Related MCP server: simple-vision-mcp

Prerequisites

Clipboard tool prerequisites

Only needed if you want to use the *_from_clipboard tools — the path-based tools (analyze_image, etc.) need none of this.

OS

Requirement

Install

macOS

pngpaste

brew install pngpaste

Linux (Wayland)

wl-paste (from wl-clipboard)

sudo apt install wl-clipboard / sudo dnf install wl-clipboard / sudo pacman -S wl-clipboard

Linux (X11)

xclip

sudo apt install xclip / sudo dnf install xclip / sudo pacman -S xclip

Windows

none — uses built-in PowerShell

Install & build

npm install
npm run build

Configure your MCP client

Add an entry to your MCP client's server config (exact file location varies by client — see your CLI's docs):

{
  "mcpServers": {
    "vision-mcp": {
      "command": "node",
      "args": ["/absolute/path/to/vision-mcp/dist/index.js"],
      "env": {
        "GROQ_API_KEY": "gsk_your_key_here"
      }
    }
  }
}

Restart your AI CLI after adding this config.

OpenCode

OpenCode uses a different config schema (mcp instead of mcpServers, and environment instead of env). Add this to opencode.json (project root) or ~/.config/opencode/opencode.json (global):

{
  "$schema": "https://opencode.ai/config.json",
  "mcp": {
    "vision-mcp": {
      "type": "local",
      "command": ["node", "/absolute/path/to/vision-mcp/dist/index.js"],
      "enabled": true,
      "environment": {
        "GROQ_API_KEY": "gsk_your_key_here"
      }
    }
  }
}

Non-multimodal models in OpenCode (e.g. Build/Big Pickle) will say they "can't read images" when you paste a screenshot, even with vision-mcp installed — they don't know to reach for the tool on their own. Add a rule so the agent does this automatically: append to your project's AGENTS.md (or ~/.config/opencode/AGENTS.md for a global rule covering every project):

## Images

The current model cannot read images natively. Whenever an image is pasted,
attached, or referenced (including clipboard screenshots), immediately use
the matching vision-mcp tool (e.g. `analyze_image_from_clipboard`,
`diagnose_error_screenshot_from_clipboard`) instead of saying you can't see
images. Do not ask the user for permission first.

Tools

Tool

Purpose

Params

analyze_image

General description of an image, or answer a specific question about it

image_path, question?

extract_text_from_screenshot

OCR — verbatim text extraction from code/terminal/document screenshots

image_path, context?

diagnose_error_screenshot

Analyze a screenshot of an error/stack trace/crash dialog → likely cause & fix

image_path, context?

understand_technical_diagram

Read architecture diagrams, flowcharts, UML, or ER diagrams

image_path, question?

analyze_data_visualization

Read charts/dashboards, extract key values and insights

image_path, question?

describe_ui

Describe a UI screenshot's layout/components/style; ask for code to get a JSX/HTML+CSS sketch

image_path, question?

analyze_image_from_clipboard

Same as analyze_image, reading from the OS clipboard instead of a file

question?

extract_text_from_screenshot_from_clipboard

Same as extract_text_from_screenshot, reading from the clipboard

context?

diagnose_error_screenshot_from_clipboard

Same as diagnose_error_screenshot, reading from the clipboard

context?

understand_technical_diagram_from_clipboard

Same as understand_technical_diagram, reading from the clipboard

question?

analyze_data_visualization_from_clipboard

Same as analyze_data_visualization, reading from the clipboard

question?

describe_ui_from_clipboard

Same as describe_ui, reading from the clipboard

question?

Example: ask your AI CLI "use diagnose_error_screenshot on ./error.png, I was running npm test" and it will call the tool with image_path: "./error.png" and context: "running npm test". Or, after copying a screenshot to the clipboard: "use diagnose_error_screenshot_from_clipboard, I was running npm test".

image_path may be absolute or relative — relative paths are resolved against the server process's working directory (normally your project root, as launched by your MCP client).

Limits

  • Allowed formats: .png .jpg .jpeg .gif .webp .bmp, verified by file content (magic bytes), not just the extension. This applies equally to clipboard images.

  • Max local file size: 20MB (path-based tools only, checked via stat() before reading).

  • Clipboard images can't be stat()-ed before reading, so instead they're bounded by a 25MB raw ceiling enforced on the underlying OS command's output.

  • Effective size for the vision model: Groq's API only allows ~4MB for inline base64 images (its 20MB limit applies only to hosted image URLs, which this server does not use). In practice, keep images under ~3MB raw so they encode under that 4MB base64 cap — larger images (whether from a file or the clipboard) are rejected before any API call is made, with a clear message.

Troubleshooting

  • "GROQ_API_KEY environment variable is not set" — add GROQ_API_KEY to the env block in your MCP client config and restart the client.

  • "IMAGE_TOO_LARGE_FOR_MODEL" — resize or compress the image below ~3MB and try again.

  • "MAGIC_BYTE_MISMATCH" — the file's actual content doesn't match a supported image format (e.g. a non-image file with an image-like extension); this is a deliberate safety check, not a bug.

  • "UNSUPPORTED_EXTENSION" — convert the file to one of the allowed formats.

  • "NO_TOOL" (clipboard tools only) — the required OS clipboard utility isn't installed; see Clipboard tool prerequisites.

  • "NO_IMAGE" (clipboard tools only) — the clipboard doesn't currently hold an image; copy a screenshot/image first and try again.

  • "CLIPBOARD_IMAGE_TOO_LARGE" (clipboard tools only) — the clipboard image exceeds the 25MB raw ceiling; copy a smaller image.

  • "NO_DISPLAY" (Linux clipboard tools only) — no graphical session was detected (WAYLAND_DISPLAY/DISPLAY both unset); clipboard access needs a desktop session.

  • "UNSUPPORTED_PLATFORM" (clipboard tools only) — the clipboard tools only support macOS, Linux, and Windows.

Platform coverage notes

Clipboard support is implemented for macOS (pngpaste), Linux (wl-paste for Wayland, xclip for X11, with automatic fallback between them), and Windows (built-in PowerShell). It has been live-tested end-to-end on Linux/Wayland; the X11 and macOS/Windows code paths are written defensively per each tool's documented behavior but have not been exercised on real X11/macOS/Windows machines yet — treat those three as needing a follow-up manual smoke test before relying on them in production.

Install Server
A
license - permissive license
A
quality
C
maintenance

Maintenance

Maintainers
Response time
Release cycle
Releases (12mo)
Commit activity

Resources

Unclaimed servers have limited discoverability.

Looking for Admin?

If you are the server author, to access and configure the admin panel.

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/konan-1947/vision-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server