image-recognition-mcp
Allows vision-less LLMs to analyze clipboard images by proxying to an OpenAI-compatible vision model for description, OCR, and error diagnosis.
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@image-recognition-mcpWhat's in the screenshot in my clipboard?"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
image-recognition-mcp
An MCP server that gives vision-less LLMs the ability to recognize clipboard screenshots and images, by proxying to a configured OpenAI-compatible vision model.
If your coding agent (like ZCode running on a text-only model) can't see images, register this server once and it gains clipboard-first image analysis tools that return a textual description / OCR / answer about screenshots and images.
LLM (no vision) ──MCP/stdio──► image-recognition-mcp ──OpenAI-compatible API──► vision model ──► text resultFeatures
🖼️ Four tools:
analyze_clipboard_imagefor general screenshot/image analysisextract_clipboard_textfor OCR-heavy screenshotsdiagnose_clipboard_errorfor errors, stack traces, terminal output, and failed UI statesrecognize_imagefor clipboard, file, URL, data URL, or base64 image input
📥 Four input forms:
Current clipboard image / latest screenshot (default)
Local file path (
/Users/x/a.png,./pic.jpg,~/Desktopshot.png)HTTP/HTTPS URL (passed straight to OpenAI)
Base64 string or
data:image/...;base64,...data URL
🔧 Configurable OpenAI-compatible provider, model (
gpt-4o-miniby default), detail level, max tokens, timeout🛡️ Validates local / base64 / clipboard images before sending them upstream
🔒 Optional local file path switch and allowlist for tighter deployments
🧱 stdio transport — works with any MCP-compatible host (ZCode, Claude Desktop, etc.)
Related MCP server: Vision MCP Server
Prerequisites
Node.js ≥ 20
An OpenAI API key with access to
gpt-4o/gpt-4o-miniClipboard capture:
macOS:
pngpaste(brew install pngpaste)Windows: Windows PowerShell (
powershell.exe, built in)Linux:
wl-pastefromwl-clipboardon Wayland, orxclipon X11
Install
git clone <this-repo> image-recognition-mcp
cd image-recognition-mcp
npm install
npm run buildConfigure
Copy .env.example to .env and fill in your key. The server loads the project-root .env file when present, while keeping any environment variables already provided by the MCP host.
cp .env.example .envEnv var | Default | Description |
| — (required) | OpenAI-compatible API key |
|
| Vision model |
| OpenAI default | Override for proxies / compatible gateways |
|
| Request timeout |
|
| Set to |
| — | Comma-separated local path allowlist, e.g. |
Provider examples:
# OpenAI
OPENAI_API_KEY=sk-...
OPENAI_MODEL=gpt-4o-mini
# Gemini OpenAI-compatible endpoint
OPENAI_API_KEY=...
OPENAI_MODEL=gemini-2.5-flash
OPENAI_BASE_URL=https://generativelanguage.googleapis.com/v1beta/openai
# Qwen / DashScope OpenAI-compatible endpoint
OPENAI_API_KEY=...
OPENAI_MODEL=qwen-vl-plus
OPENAI_BASE_URL=https://dashscope.aliyuncs.com/compatible-mode/v1
# LiteLLM or a self-hosted OpenAI-compatible gateway
OPENAI_API_KEY=...
OPENAI_MODEL=your-vision-model
OPENAI_BASE_URL=http://localhost:4000/v1Run locally
npm run dev # tsx, no build step
# or
npm run build && npm startThe server speaks MCP over stdio — it expects JSON-RPC frames on stdin and writes them to stdout.
Register with ZCode
Add an entry to your ZCode MCP config. The config file lives at ~/.zcode/v2/config.json (look for the mcpServers key). The absolute path to dist/index.js must be used.
{
"mcpServers": {
"clipboard-vision": {
"command": "node",
"args": ["/Volumes/wd-512/WebstormProjects/image-recognition-mcp/dist/index.js"],
"env": {
"OPENAI_API_KEY": "sk-xxxxxxxxxxxxxxxx",
"OPENAI_MODEL": "gpt-4o-mini"
}
}
}
}Restart ZCode, then copy a screenshot to your clipboard and ask it something like:
"Analyze the screenshot in my clipboard — what text is shown?"
The agent should call analyze_clipboard_image or recognize_image with its default clipboard input, then reason over the returned text.
Register with Claude Desktop
Edit ~/Library/Application Support/Claude/claude_desktop_config.json:
{
"mcpServers": {
"clipboard-vision": {
"command": "node",
"args": ["/abs/path/to/image-recognition-mcp/dist/index.js"],
"env": { "OPENAI_API_KEY": "sk-..." }
}
}
}Tool reference
analyze_clipboard_image
Parameter | Type | Required | Default | Description |
| string | no |
| Question or instruction about the image |
|
| no |
| Vision detail level. |
| integer | no |
| Max tokens for the response |
Reads the current clipboard image / latest screenshot.
extract_clipboard_text
Parameter | Type | Required | Default | Description |
| string | no |
| Question or instruction about the image |
|
| no |
| Vision detail level. |
| integer | no |
| Max tokens for the response |
Reads the current clipboard image / latest screenshot and optimizes the default prompt for OCR.
diagnose_clipboard_error
Parameter | Type | Required | Default | Description |
| string | no |
| Question or instruction about the image |
|
| no |
| Vision detail level. |
| integer | no |
| Max tokens for the response |
Reads the current clipboard image / latest screenshot and optimizes the default prompt for debugging.
recognize_image
Parameter | Type | Required | Default | Description |
| string | no |
| Path / URL / data URL / base64 / |
| string | no |
| Question or instruction about the image |
|
| no |
| Vision detail level. |
| integer | no |
| Max tokens for the response |
Returns { content: [{ type: "text", text: "..." }] }, or isError: true with an error message on failure.
Local file, data URL, raw base64, and clipboard inputs must be PNG, JPEG, GIF, WebP, or BMP images up to 20 MiB. HTTP/HTTPS URLs are passed to OpenAI as URLs.
Project structure
image-recognition-mcp/
├── package.json
├── tsconfig.json
├── .env.example
└── src/
├── index.ts # MCP server entry, registers tools, stdio transport
├── config.ts # Loads + validates env config
├── tools/
│ └── recognize.ts # vision tool definitions + handlers
├── providers/
│ └── openai.ts # GPT-4o vision call
└── inputs/
├── index.ts # resolveImage() dispatcher
├── types.ts
├── image.ts # image MIME / size / magic-byte validation
├── file.ts # local path → base64
├── url.ts # HTTP(S) URL passthrough
├── base64.ts # base64 / data URL
└── clipboard.ts # clipboard image capture for macOS / Windows / LinuxLicense
MIT
This server cannot be installed
Maintenance
Resources
Unclaimed servers have limited discoverability.
Looking for Admin?
If you are the server author, to access and configure the admin panel.
Latest Blog Posts
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/llt22/image-recognition-mcp'
If you have feedback or need assistance with the MCP directory API, please join our Discord server