Which integrations are available for this server?

Enables the use of Google Gemini's vision models for image description, comparison, and text extraction through the mcp-eyes server. Enables the use of locally hosted vision models via Ollama (e.g., LLaVA, Qwen2-VL) for image description, comparison, and text extraction through the mcp-eyes server. Enables the use of OpenAI's vision models (e.g., GPT-4o, GPT-4o-mini) for image description, comparison, and text extraction through the mcp-eyes server.

How do I use mcp-eyes?

1. Click on "Install Server". 2. Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state. 3. In the chat, type @ followed by the MCP server name and your instructions, e.g., "@mcp-eyes Describe what you see in this image" That's it! The server will respond to your query, and you can continue using it as needed. Here is a step-by-step guide with screenshots.

mcp-eyes

by loudMore

Overview Schema Related Servers Score Discussions

Python

Local

vision-extension

Drop-in vision capability pack for text-only reasoning LLMs. One repo containing an MCP server and a Claude Code skill, both engineered around a single contract: the vision model only describes — your reasoning model does the thinking.

English · 中文

English

What's in this repo

Directory	What it is	Who installs it
`mcp-vision-extension/`	The MCP server (Python package `vision_extension`). Pairs any text-only reasoning model with any vision model over OpenAI or Anthropic protocol.	Required. Install once per machine.
`skills-vision-extension/`	A Claude Code skill that knows the install playbook AND the day-to-day collaboration patterns between the reasoning model and the vision model.	Optional but strongly recommended for Claude Code users. Copy into `~/.claude/skills/`.

These two pieces are designed to work together. The MCP server gives your text model vision; the skill teaches your text model how to use that vision well.

Install everything in 2 commands

# 1. The MCP server
pip install "git+https://github.com/loudMore/vision-extension.git#subdirectory=mcp-vision-extension"

# 2. The Claude Code skill (optional)
git clone https://github.com/loudMore/vision-extension.git /tmp/vx
cp -r /tmp/vx/skills-vision-extension/vision-extension ~/.claude/skills/

Then point your MCP client at the new server. Detailed steps + provider presets in mcp-vision-extension/README.md.

Or just tell your agent

If you have Claude Code (or any MCP-aware agent), copy the skill once:

git clone https://github.com/loudMore/vision-extension.git
cp -r vision-extension/skills-vision-extension/vision-extension ~/.claude/skills/

Then say:

"Install vision-extension. Use the <doubao | openai | qwen | gemini | ollama | …> provider. Here's my key: <KEY>."

The skill handles the rest. You don't write any JSON.

Why this exists

Long-context reasoning models (DeepSeek V4 Pro, GLM 5.2, Kimi K2, Qwen 3 Max, …) are extraordinary at code and analysis but cannot see images. Naively bolting on a vision API has two problems:

No standard pipe — every IDE wires it differently.
Vision models love to "help" — GPT-4o, Gemini, Doubao all reflexively produce advice, debugging hypotheses, and design opinions when you only wanted a description. The reasoning work gets fragmented.

vision-extension solves both:

One MCP server, works with Claude Code, Cursor, Continue, Cline, Roo, or anything else that speaks MCP.
Describe-only contract — the vision model is system-prompted into a pure visual scanner. No advice. No fixes. No opinions. Just verbatim transcription and structured description.
One Claude Code skill that turns the install + daily-use rules into a single trigger phrase.
Provider-agnostic — Anthropic protocol, OpenAI-compatible protocol. Switch with one env var.

License

MIT.

Related MCP server: low-hallucination-vision

中文

仓库里有什么

目录	是什么	谁要装
`mcp-vision-extension/`	MCP server（Python 包 `vision_extension`），把任意纯文本推理模型和任意视觉模型用 OpenAI/Anthropic 协议接到一起	必装，每台机器装一次
`skills-vision-extension/`	Claude Code skill，把安装流程 + 主模型与视觉模型的日常协作规则打包好	强烈推荐，复制到 `~/.claude/skills/` 即可

两块组件协同设计。MCP server 给文本模型装上视觉；skill 教文本模型怎么用好这套视觉。

两条命令搞定

# 1. 装 MCP server
pip install "git+https://github.com/loudMore/vision-extension.git#subdirectory=mcp-vision-extension"

# 2. 装 Claude Code skill（可选）
git clone https://github.com/loudMore/vision-extension.git /tmp/vx
cp -r /tmp/vx/skills-vision-extension/vision-extension ~/.claude/skills/

然后让你的 MCP 客户端配新 server。详细步骤和 12 个 provider 预设见 mcp-vision-extension/README.md。

或者直接让你的 agent 装

装完 skill 之后，对你的 Claude Code（或任何支持 MCP 的 agent）说：

"装个 vision-extension。视觉模型用 <豆包 | openai | 通义 | 智谱 | ollama | …>，key 是 <KEY>。"

skill 会按 7 步确定流程把剩下的全做完。你不用写任何 JSON。

为什么做这个

DeepSeek V4 Pro / GLM 5.2 / Kimi K2 / Qwen 3 Max 这类长上下文推理模型推理超强，但看不见图。直接接个视觉 API 拼起来有两个老问题：

没有统一通道 —— 每个 IDE 接法都不一样
视觉模型爱"帮忙" —— GPT-4o / Gemini / 豆包都会条件反射地给方案、提假设、写评价，把推理工作抢走一半，你只想要个描述

vision-extension 一并解决：

一个 MCP server，Claude Code / Cursor / Continue / Cline / Roo 通用
describe-only 契约 —— 视觉模型被系统提示锁成纯扫描器，不给建议、不给方案、不给评价，只做逐字转录和结构化描述
一个 Claude Code skill 把安装流程和日常使用规则压成一句话触发
协议解耦 —— Anthropic 协议、OpenAI 协议都支持，一个环境变量切换

License

MIT。

This server cannot be installed

license - permissive license

quality - not tested

maintenance

How are these scores calculated?

Maintenance

–Maintainers

–Response time

–Release cycle

–Releases (12mo)

Commit activity

Resources

GitHub Repository

Need Help?

Related Servers

Unclaimed servers have limited discoverability.

Looking for Admin?

If you are the server author, to access and configure the admin panel.

Latest Blog Posts

Your AI Chatbot Just Exposed Your CEO's Salary to an Intern
By Om-Shree-0709 on July 2, 2026.
Agent Identity
MCP Security
OAuth Delegation
Why MCP Servers Need Execution Sandboxing (And Why Your Current Stack Isn't Enough)
By Om-Shree-0709 on June 30, 2026.
Agentic Ai
Prompt Injection
WebAssembly
Lightport: Open-Sourcing Glama's AI Gateway
By punkpeye on April 27, 2026.
OpenAI
open source

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/loudMore/vision-extension'

If you have feedback or need assistance with the MCP directory API, please join our Discord server