vlm-mcp-server
Enables vision and video analysis tasks using locally hosted models via Ollama's OpenAI-compatible API.
Enables vision and video analysis tasks such as UI-to-code conversion, OCR, error diagnosis, diagram analysis, data visualization insights, and video content analysis using OpenAI's Chat Completions or Responses API.
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@vlm-mcp-serverAnalyze this error screenshot and suggest fixes"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
VLM MCP Server
中文文档 | English

A Model Context Protocol (MCP) server providing vision & video analysis tools, configurable with any model provider.
This is a reverse-engineered and extended reimplementation of @z_ai/mcp-server. The original server was hard-wired to the Z.AI / Zhipu Chat Completions API. This fork introduces a provider abstraction layer so the same set of tools can run against any of three API families:
Chat Completions — OpenAI-compatible
POST {base}/chat/completions(OpenAI, Z.AI, Zhipu, OpenRouter, Together, Groq, DeepSeek, Moonshot, local Ollama / LM Studio, …)Responses — OpenAI
POST {base}/responses(gpt-4o, o-series reasoning models)Anthropic Messages —
POST {base}/v1/messages(Claude, and Anthropic-compatible gateways)
Built on top of the original
@z_ai/mcp-serverdesign (Apache-2.0). All credit for the tooling, prompts, and architecture goes to the original authors (Chao Gong, Lei Yuan / Z.AI). This project extends it with a pluggable provider layer.
Available Tools
This server provides specialized tools for different image and video analysis tasks:
Image Analysis Tools
ui_to_artifact— Convert UI screenshots to various artifactsGenerate frontend code from designs (
code)Create AI prompts for UI recreation (
prompt)Extract design specifications (
spec)Generate natural language UI descriptions (
description)
extract_text_from_screenshot— OCR and text extractionExtract code from screenshots with proper formatting
Extract terminal output and logs
Supports programming language hints for better accuracy
diagnose_error_screenshot— Error diagnosis and troubleshootingAnalyze error messages and stack traces
Identify root causes and provide actionable solutions
understand_technical_diagram— Technical diagram analysisAnalyze architecture, flowchart, UML, ER, and sequence diagrams
Identify design patterns and explain structure
analyze_data_visualization— Data visualization insightsExtract insights, trends, and anomalies from charts and graphs
ui_diff_check— UI comparison for visual regressionCompare expected vs actual UI implementations
Prioritize issues by severity
analyze_image— General-purpose image analysis (fallback)
Video Analysis Tools
analyze_video— Video content analysis (local files or URLs, ≤8MB, MP4/MOV/M4V)
Related MCP server: Perceptron Vision MCP Server
Configuration
Choosing a provider
The simplest way to configure a provider is to fill in one of the three OPENAI_* env-var groups — the server auto-detects which group is set:
Group | API family | Endpoint appended to base URL |
| OpenAI Chat Completions |
|
| OpenAI Responses |
|
| Anthropic Messages |
|
If multiple groups are configured, set VLM_PROVIDER explicitly to pick one:
Value | API family |
| OpenAI Chat Completions |
| OpenAI Responses |
| Anthropic Messages |
| First configured |
In auto mode (when no OPENAI_* group is set) the provider is inferred as follows:
Built-in Z.AI / Zhipu platform mode (
Z_AI_MODE=ZAI|ZHIPU) →chat-completionsBase URL contains
anthropic, or key starts withsk-ant→anthropicOtherwise →
chat-completions(the most broadly compatible default)
Environment variables
The server loads variables from a .env file in the working directory at
startup (real environment variables take precedence). Three configuration
layers are supported; precedence is per-provider groups > generic > legacy.
Per-provider groups (configure each family independently — auto picks the
first group with both a key and a base URL set):
Variable | Description |
| Chat Completions provider |
| Responses provider |
| Anthropic Messages provider |
Generic:
Variable | Description | Default |
| API key for your provider | (required) |
| Provider API root (with or without trailing slash) | Zhipu default |
| Model name |
|
| Provider family (see above) |
|
| Sampling temperature |
|
| Top-p |
|
| Max output tokens |
|
| Request timeout in ms |
|
| Retry attempts |
|
|
|
|
| Custom log file path |
|
Legacy (Z.AI / Zhipu, backward-compatible with @z_ai/mcp-server):
Variable | Description |
| API key (used if |
| API root |
|
|
| Model name |
| Sampling params |
| Timeout / retries |
| Fallback key if no |
Per-provider group variables take precedence over generic, which take
precedence over legacy. The active provider is resolved from VLM_PROVIDER,
or — in auto mode — from whichever OPENAI_<FAMILY>_* group is configured.
Usage
The server speaks MCP over stdio. Configuration is via environment variables — pick one of the three provider families below and fill in the corresponding OPENAI_* group. The server auto-detects which group is configured; you can also set VLM_PROVIDER explicitly to chat-completions / responses / anthropic.
Provider family | Environment variables |
Chat Completions (OpenAI / Z.AI / Zhipu / OpenRouter / Together / Groq / DeepSeek / Moonshot / local) |
|
Responses (OpenAI gpt-4o, o-series) |
|
Anthropic Messages (Claude) |
|
The same values can also be supplied via a
.envfile in the working directory, or through the genericVLM_*/ legacyZ_AI_*variables. See Configuration.
Claude Code
One-line install (Chat Completions example — replace with your API Key / Base URL / model):
claude mcp add -s user vlm-mcp-server \
--env OPENAI_CHAT_COMPLETIONS_API_KEY=sk-... \
OPENAI_CHAT_COMPLETIONS_BASE_URL=https://api.openai.com/v1/ \
OPENAI_CHAT_COMPLETIONS_MODEL=gpt-4o \
-- npx -y vlm-mcp-serverIf you forgot to replace the API Key, remove the old config before re-running:
claude mcp list
claude mcp remove vlm-mcp-serverOn Windows PowerShell, if you hit issues with the
-yflag, run the same command in Command Prompt (CMD). TheWindows requires 'cmd /c' wrapperwarning can be ignored.
Manual config — edit the mcpServers section of ~/.claude.json (Anthropic example):
{
"mcpServers": {
"vlm-mcp-server": {
"type": "stdio",
"command": "npx",
"args": ["-y", "vlm-mcp-server"],
"env": {
"OPENAI_ANTHROPIC_API_KEY": "sk-ant-...",
"OPENAI_ANTHROPIC_BASE_URL": "https://api.anthropic.com",
"OPENAI_ANTHROPIC_MODEL": "claude-sonnet-4-5"
}
}
}
}{
"mcpServers": {
"vlm-mcp-server": {
"type": "stdio",
"command": "npx",
"args": ["-y", "vlm-mcp-server"],
"env": {
"OPENAI_RESPONSES_API_KEY": "sk-...",
"OPENAI_RESPONSES_BASE_URL": "https://api.openai.com/v1/",
"OPENAI_RESPONSES_MODEL": "gpt-4o"
}
}
}
}Cline (VS Code)
Add the MCP server config in the Cline extension settings (Chat Completions example):
{
"mcpServers": {
"vlm-mcp-server": {
"type": "stdio",
"command": "npx",
"args": ["-y", "vlm-mcp-server"],
"env": {
"OPENAI_CHAT_COMPLETIONS_API_KEY": "sk-...",
"OPENAI_CHAT_COMPLETIONS_BASE_URL": "https://api.openai.com/v1/",
"OPENAI_CHAT_COMPLETIONS_MODEL": "gpt-4o"
}
}
}
}OpenCode
See the OpenCode MCP docs (Anthropic example):
{
"$schema": "https://opencode.ai/config.json",
"mcp": {
"vlm-mcp-server": {
"type": "local",
"command": ["npx", "-y", "vlm-mcp-server"],
"environment": {
"OPENAI_ANTHROPIC_API_KEY": "sk-ant-...",
"OPENAI_ANTHROPIC_BASE_URL": "https://api.anthropic.com",
"OPENAI_ANTHROPIC_MODEL": "claude-sonnet-4-5"
}
}
}
}Crush
{
"$schema": "https://charm.land/crush.json",
"mcp": {
"vlm-mcp-server": {
"type": "stdio",
"command": "npx",
"args": ["-y", "vlm-mcp-server"],
"env": {
"OPENAI_RESPONSES_API_KEY": "sk-...",
"OPENAI_RESPONSES_BASE_URL": "https://api.openai.com/v1/",
"OPENAI_RESPONSES_MODEL": "gpt-4o"
}
}
}
}Roo Code / Kilo Code and other MCP clients
For Roo Code, Kilo Code, and other MCP-compatible clients, use the following generic config (Chat Completions example):
{
"mcpServers": {
"vlm-mcp-server": {
"type": "stdio",
"command": "npx",
"args": ["-y", "vlm-mcp-server"],
"env": {
"OPENAI_CHAT_COMPLETIONS_API_KEY": "sk-...",
"OPENAI_CHAT_COMPLETIONS_BASE_URL": "https://api.openai.com/v1/",
"OPENAI_CHAT_COMPLETIONS_MODEL": "gpt-4o"
}
}
}
}To switch to another API family, replace the
envblock with the correspondingOPENAI_RESPONSES_*orOPENAI_ANTHROPIC_*triple. You can also use the genericVLM_*variables together with an explicitVLM_PROVIDER.
Run locally from source
npm install
npm run build
# Start directly via environment variables
OPENAI_CHAT_COMPLETIONS_API_KEY=sk-... \
OPENAI_CHAT_COMPLETIONS_BASE_URL=https://api.openai.com/v1/ \
OPENAI_CHAT_COMPLETIONS_MODEL=gpt-4o \
npm start
# Or write the variables into a .env file and just start (auto-loaded)
npm startUsage Examples
Once the server is installed in your client, you can use it through conversation. For example, in Claude Code, type hi describe this xx.png — the MCP Server will process the image and return a description (the image must exist in the current directory).
Outside Claude Code, pasting an image directly into the client will NOT invoke this MCP Server — the client encodes the image and calls the model API itself. Best practice: place images in a local directory and refer to them by name or path in conversation, e.g.
What does demo.png describe?
Troubleshooting
Run the server directly from the command line to verify it starts, isolating environment / permission issues:
# Linux / macOS
OPENAI_CHAT_COMPLETIONS_API_KEY=sk-... \
OPENAI_CHAT_COMPLETIONS_BASE_URL=https://api.openai.com/v1/ \
OPENAI_CHAT_COMPLETIONS_MODEL=gpt-4o \
npx -y vlm-mcp-server
# Windows CMD
set OPENAI_CHAT_COMPLETIONS_API_KEY=sk-... && set OPENAI_CHAT_COMPLETIONS_BASE_URL=https://api.openai.com/v1/ && set OPENAI_CHAT_COMPLETIONS_MODEL=gpt-4o && npx -y vlm-mcp-server
# Windows PowerShell
$env:OPENAI_CHAT_COMPLETIONS_API_KEY="sk-..."; $env:OPENAI_CHAT_COMPLETIONS_BASE_URL="https://api.openai.com/v1/"; $env:OPENAI_CHAT_COMPLETIONS_MODEL="gpt-4o"; npx -y vlm-mcp-serverIf it starts successfully, the environment is correct — the issue is likely in the client's MCP config; double-check it.
If it fails, investigate the error message (pasting it to an LLM for analysis is recommended).
Other common issues:
Connection failure
Ensure Node.js 18 or newer is installed.
Run
node -vandnpx -vto confirm the runtime is available.Verify the environment variables (
OPENAI_*triple orVLM_*) are set correctly.
Invalid API Key
Confirm the API Key was copied correctly.
Check that the API Key is activated.
Ensure the selected provider family matches the API Key (Chat Completions / Responses / Anthropic).
Check that the API Key has sufficient balance.
Connection timeout
Check your network connection.
Check firewall settings.
Try switching to a different provider family or base URL.
Increase the timeout (
VLM_TIMEOUT, default 300000ms).
Architecture
src/
├── index.ts # Entry point: starts the MCP server, registers all tools
├── types/ # Error types (McpError, ApiError, ValidationError, …)
├── core/
│ ├── environment.ts # Env config (generic VLM_* + legacy Z_AI_*), URL resolution
│ ├── chat-service.ts # Delegates to the active VisionProvider
│ ├── file-service.ts # File validation + base64 encoding (image/video)
│ ├── base-image-service.ts # Shared image-processing logic for all image tools
│ ├── api-common.ts # Message builders, response helpers, retry wrapper
│ ├── error-handler.ts # Error hierarchy + handling/recovery strategies
│ └── logger.ts # stderr + file logger (keeps stdout JSON-clean)
├── providers/ # ← NEW: pluggable model-provider abstraction
│ ├── types.ts # VisionProvider interface, ChatMessage, postJson helper
│ ├── chat-completions.ts # OpenAI-compatible Chat Completions
│ ├── responses.ts # OpenAI Responses API
│ ├── anthropic.ts # Anthropic Messages API
│ └── index.ts # Provider selection (VLM_PROVIDER / auto-infer)
├── prompts/ # System prompts for each specialized tool
└── tools/ # 8 tool registrations (7 image + 1 video)The provider layer (src/providers/) is the key extension. Each provider implements a VisionProvider interface that takes normalized ChatMessage[] (the OpenAI Chat Completions content-part format as internal lingua franca) and translates it to the provider's wire format. chat-service.ts simply delegates to the resolved provider, so none of the tool code needed to change.
License
Apache-2.0
This server cannot be installed
Maintenance
Resources
Unclaimed servers have limited discoverability.
Looking for Admin?
If you are the server author, to access and configure the admin panel.
Latest Blog Posts
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/syntx-ai/vlm-mcp-server'
If you have feedback or need assistance with the MCP directory API, please join our Discord server