Which integrations are available for this server?

Enables use of local LLMs via Ollama as a vision backend for image analysis, description, and comparison. Sends images to OpenAI-compatible chat completion endpoints for analysis, description, and comparison tasks.

How do I use image_mcp?

1. Click on "Install Server". 2. Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state. 3. In the chat, type @ followed by the MCP server name and your instructions, e.g., "@image_mcp What's in this image? https://example.com/photo.jpg" That's it! The server will respond to your query, and you can continue using it as needed. Here is a step-by-step guide with screenshots.

image_mcp

by karlcc

Overview Schema Related Servers Score Discussions

TypeScript

Remote

Image Summarization MCP Server

A Model Context Protocol (MCP) server that accepts image files and sends them to an OpenAI-compatible chat completion endpoint for analysis, description, and comparison tasks.

Use Case

Many LLMs used for agentic coding are text-only and lack support for image inputs. This tool allows you to use a secondary model dedicated to describing and analyzing images, without having to use a multi-modal LLM for your primary model. It supports both cloud and local LLMs via any server that supports the OpenAI chat completion endpoint (including llama.cpp / llama-swap, Ollama, open-webui, OpenRouter, etc).

For local models, gemma3:4b-it-qat works quite well with a relatively small footprint and fast performance (even on CPU-only).

Related MCP server: Vision MCP Server

Features

Accepts images via unified image_path parameter — local paths, URLs, and data URLs
Supports task parameter to perform specific analysis beyond general description
Sends images to OpenAI-compatible chat completion endpoints
Returns detailed image descriptions
Configurable endpoint URL, API key, and model
Optional persistent config file at ~/.config/image_mcp/config.json
Command-line interface for configuration
Comprehensive error handling

Quick install from NPM

Add this to your global mcp_settings.json or project mcp.json:

{
  "mcpServers": {
    "image_mcp": {
      "command": "npx",
      "args": [
        "-y",
        "@karlcc/image_mcp"
      ],
      "env": {
        "OPENAI_API_KEY": "YOUR_API_KEY",
        "OPENAI_BASE_URL": "https://api.openai.com/v1",
        "OPENAI_MODEL": "gemini-3.1-flash-lite-preview"
      }
    }
  }
}

If you prefer claude mcp add-json, use:

claude mcp add-json image_mcp --scope user '{
  "type": "stdio",
  "command": "npx",
  "args": ["-y", "@karlcc/image_mcp"],
  "env": {
    "OPENAI_API_KEY": "YOUR_API_KEY",
    "OPENAI_BASE_URL": "https://api.openai.com/v1",
    "OPENAI_MODEL": "gemini-3.1-flash-lite-preview"
  }
}'

At a minimum, configure base URL, API key, and model for your chosen backend.

For use with slow local models, you may need to also increase the timeout and max retries settings.

Configuration

The MCP server can be configured using a config file, environment variables, or command-line arguments.

Environment Variables

OPENAI_API_KEY: Your API key for the OpenAI-compatible service
OPENAI_BASE_URL: The base URL of the OpenAI-compatible service (default: http://localhost:9292/v1)
OPENAI_MODEL: The model to use for image analysis
OPENAI_TIMEOUT: Request timeout in milliseconds (default: 60000). When running local models you may need to increase this.
OPENAI_MAX_RETRIES: Maximum number of retry attempts (default: 3)
OPENAI_STREAMING: Enable/disable streaming (true/false)
MCP_USE_HTTP: Enable HTTP/SSE transport (true/false)
MCP_PORT: HTTP port for MCP server (default: 8080)
IMAGE_MCP_CONFIG_PATH: Override config file path (default: ~/.config/image_mcp/config.json)

Command Line Arguments

npx -y @karlcc/image_mcp \
  --api-key your-api-key \
  --base-url https://api.openai.com/v1 \
  --model gpt-4-vision-preview \
  --http \
  --mcp-port 8080 \
  --timeout 60000 \
  --max-retries 5

Configuration Priority

Command-line arguments
Environment variables
Config file (~/.config/image_mcp/config.json)
Default values

Persistent Config

Save your resolved configuration once and reuse it across sessions:

node build/index.js \
  --api-key your-api-key \
  --base-url https://api.openai.com/v1 \
  --model gpt-4.1-mini \
  --http \
  --mcp-port 8080 \
  --save-config

This writes ~/.config/image_mcp/config.json (or a custom file via --config /path/to/config.json).

Verifying your model has vision

Before committing to a model, verify it can actually see images:

# Automatic: --save-config verifies vision by default before writing
node build/index.js --model your-model --save-config

# Quick one-shot check:
IMAGE_MCP_SMOKE=1 npm run test:smoke

# Opt-in startup probe (warns in stderr if model can't see):
IMAGE_MCP_PROBE_ON_START=true node build/index.js

If verification fails, the config file is not written and the exit code is non-zero. Use --no-verify to skip the check.

Usage

Host model vs vision backend

When the host LLM (e.g. GLM-5.1, Claude Haiku) is text-only, it cannot inspect pixels. Wire image_mcp to a vision-capable backend and the host will route image tasks there automatically.

Z.AI / GLM example

npx -y @karlcc/image_mcp \
  --base-url https://open.bigmodel.cn/api/paas/v4 \
  --api-key $ZAI_API_KEY \
  --model glm-4.6v-flash

The app stays backend-agnostic — any OpenAI-compatible endpoint works. glm-4.6v-flash is shown because it is a capable, low-latency vision model available on Z.AI.

Client routing snippet

Add to your MCP client config (e.g. Claude Desktop, Cursor, or .claude/settings.json):

{
  "mcpServers": {
    "image_mcp": {
      "command": "npx",
      "args": [
        "-y",
        "@karlcc/image_mcp@latest"
      ],
      "env": {
        "OPENAI_API_KEY": "YOUR_ZAI_KEY",
        "OPENAI_BASE_URL": "https://open.bigmodel.cn/api/paas/v4",
        "OPENAI_MODEL": "glm-4.6v-flash"
      }
    }
  }
}

MCP Tools

`read_image_via_vision_backend`

Reads and analyzes one image via the vision backend. Accepts local absolute paths, http(s) URLs, and data URLs.

Parameters

image_path (string): Image to analyze. Supports:
- Absolute local paths (e.g. /Users/me/screenshot.png)
- HTTP/HTTPS URLs (e.g. https://example.com/image.jpg)
- Data URLs with base64 encoded images (e.g. data:image/png;base64,...)
task (string, optional): What to do with the image (e.g. "Read all text", "Describe the UI layout", "Extract data from chart"). Defaults to a general description.

Example Usage

Using file path:

{
  "name": "read_image_via_vision_backend",
  "arguments": {
    "image_path": "/Users/me/screenshot.png",
    "task": "Read all text in this screenshot"
  }
}

Using HTTP URL:

{
  "name": "read_image_via_vision_backend",
  "arguments": {
    "image_path": "https://example.com/image.jpg"
  }
}

`compare_images_via_vision_backend`

Compares 2 or more images via the vision backend. Accepts local absolute paths, http(s) URLs, and data URLs.

Parameters

image_paths (array of strings, min 2): Images to compare. Each entry supports the same formats as image_path above.
task (string, optional): What to compare (e.g. "Describe UI differences", "Which chart shows higher values?"). Defaults to a general comparison.

Example Usage

{
  "name": "compare_images_via_vision_backend",
  "arguments": {
    "image_paths": [
      "/Users/me/before.png",
      "/Users/me/after.png"
    ],
    "task": "Describe the UI differences between these screenshots"
  }
}

`get_config_info`

Returns the active server configuration for diagnostics with the API key redacted.

Dev Setup

Clone the repository:

git clone https://github.com/karlcc/image_mcp.git
cd image_mcp

Install dependencies:

npm install

Build the project:

npm run build

Starting the Server

node build/index.js

The server will start and listen on stdio for MCP protocol communications.

To run with HTTP/SSE transport:

node build/index.js --http --mcp-port 8080

MCP Tool Installation (local dev build)

Add this to your global mcp_settings.json or project mcp.json:

{
  "mcpServers": {
    "image_mcp": {
      "command": "node",
      "args": [
        "/path/to/image_mcp/build/index.js"
      ],
      "env": {
        "OPENAI_API_KEY": "YOUR_API_KEY",
        "OPENAI_BASE_URL": "http://localhost:9292/v1",
        "OPENAI_MODEL": "gemma3:4b-it-qat"
      }
    }
  }
}

Testing

Running Tests

Run the test suite:

npm test

The test suite includes:

Unit tests for image processing functionality
Integration tests that require a mock server
Tests for both read_image_via_vision_backend and compare_images_via_vision_backend tools

Model Benchmark (Accuracy + Latency)

Run the built-in benchmark to compare candidate models with weighted accuracy and response latency:

npm run benchmark:models

By default this uses:

Task file: bench/tasks.default.json
Models: ~/.config/image_mcp/model_candidates.json (candidates array)
Ranking: weighted accuracy (desc), success rate (desc), median latency (asc)

Useful overrides:

node scripts/benchmark-models.mjs \
  --models gemma-4-31b,kimi-k2.5-fw,qwen3.5-397b-fw \
  --repeats 2 \
  --tasks bench/tasks.default.json

Outputs:

Raw call-level results at /tmp/image_mcp_accuracy_benchmark_*.jsonl
Summary at /tmp/image_mcp_accuracy_summary_*.json
Auto-updates active model in ~/.config/image_mcp/config.json (disable with --no-update-config)

Mock Server Testing

The project includes a mock OpenAI-compatible server for testing purposes.

Start the mock server in a separate terminal:

node tests/mock-server.js

The mock server will start on http://localhost:9293 and provides endpoints for:

GET /v1/models - Lists available models
POST /v1/chat/completions - Mock chat completions with image support
POST /v1/test/image-process - Test endpoint for image processing validation

Set environment variables for the mock server:

export OPENAI_BASE_URL=http://localhost:9293/v1
export OPENAI_API_KEY=test-key
export OPENAI_MODEL=test-model-vision

Run the integration tests:

npm test tests/integration.test.ts

Real OpenAI-Compatible Server Testing

To test with a real OpenAI-compatible endpoint:

Set up your environment variables:

export OPENAI_API_KEY=your-actual-api-key
export OPENAI_BASE_URL=https://api.openai.com/v1
export OPENAI_MODEL=gpt-4-vision-preview

Or for other OpenAI-compatible services:

export OPENAI_API_KEY=your-service-api-key
export OPENAI_BASE_URL=https://your-service-endpoint/v1
export OPENAI_MODEL=your-vision-model

Start the MCP server:

node build/index.js --http --mcp-port 8080

Send test requests using an MCP client or test the tools directly.

Manual Testing

You can manually test the MCP server using tools like curl or MCP clients:

# Test with a local image file
curl -X POST http://localhost:8080/sse \
  -H "Content-Type: application/json" \
  -d '{
    "jsonrpc": "2.0",
    "id": 1,
    "method": "tools/call",
    "params": {
      "name": "read_image_via_vision_backend",
      "arguments": {
        "image_path": "/path/to/your/test/image.jpg"
      }
    }
  }'

API Reference

OpenAI-Compatible API Integration

The server sends requests to the OpenAI-compatible chat completion endpoint with the following structure:

{
  "model": "your-model",
  "messages": [
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "Describe this image in detail, including all text."
        },
        {
          "type": "image_url",
          "image_url": {
            "url": "data:image/png;base64,..."
          }
        }
      ]
    }
  ],
  "stream": false
}

Supported Image Formats

JPEG (.jpg, .jpeg)
PNG (.png)
GIF (.gif)
WebP (.webp)
SVG (.svg)
BMP (.bmp)
TIFF (.tiff)

Error Handling

The server includes comprehensive error handling for:

Invalid image files
Unsupported image formats
Missing API keys
Network connectivity issues
API response errors

Development

Project Structure

src/
├── config.ts          # Configuration management
├── image-processor.ts # Image processing utilities
├── index.ts          # Main MCP server
└── openai-client.ts  # OpenAI-compatible API client

Building

npm run build

Testing

npm test

Vision smoke test (requires API credentials):

IMAGE_MCP_SMOKE=1 npm run test:smoke

Full preflight before release:

npm run preflight

Release: tag and publish to npm

Recommended flow: GitHub Actions trusted publishing (OIDC).

One-time setup (npm package owner):

# Requires npm v11.10+ and package 2FA enabled on npm.
# If local npm is older, run via npx as shown here.
npx -y npm@latest trust github @karlcc/image_mcp \
  --repo karlcc/image_mcp \
  --file publish.yml \
  --yes

Then ship each release with:

# 1) Verify quality gates
npm run build
npm test

# 2) Commit pending changes
git add -A
git commit -m "chore(release): prepare next version"

# 3) Bump version + create git tag (patch/minor/major)
npm version patch

# 4) Push commit + tag (GitHub Actions publishes to npm)
git push origin main --follow-tags

Fallback manual publish (if trusted publishing is not configured):

npm publish --access public --otp <6-digit-otp>

Dev cycle: four layers of vision detection

The repo is designed so a non-vision model can't slip through silently:

Layer	When	How
Config save	`--save-config`	Probes model with a tiny fixture before writing config
Smoke test	`npm run test:smoke`	Jest test against the configured model
Startup probe	`IMAGE_MCP_PROBE_ON_START=true`	Warns on stderr if model fails
Benchmark	`npm run benchmark:models`	`--fail-if-any-nonvision` exits non-zero for 0% scorers

License

This project is licensed under the MIT License.

Contributing

Fork the repository
Create a feature branch
Make your changes
Add tests
Submit a pull request

Support

For issues and questions, please open an issue on the GitHub repository.

Tips

Tips / donations always appreciated to help fund future development.

This server cannot be installed

license - permissive license

quality - not tested

maintenance

How are these scores calculated?

Maintenance

–Maintainers

–Response time

–Release cycle

–Releases (12mo)

Commit activity

Resources

Need Help?

Related Servers

Unclaimed servers have limited discoverability.

Looking for Admin?

If you are the server author, to access and configure the admin panel.

Related MCP Servers

Vision MCP
Image & Video Processing Multimedia Processing
i-richardwang
A
license
A
quality
D
maintenance
Enables image analysis and understanding using Vision Language Models through OpenAI-compatible APIs. Supports analyzing images from URLs or local files with custom prompts.
Last updated 2025-12-26
1
2
MIT
Vision MCP Server
Image & Video Processing Autonomous Agents
TheNomadInOrbit
A
license
B
quality
D
maintenance
Enables vision capabilities for any AI model by routing image analysis requests through OpenRouter's vision models. It provides tools to analyze images from URLs, local file paths, or base64 data.
Last updated 2025-10-09
2
91
17
MIT
Image Parse MCP
AI & Machine Learning Image & Video Processing
1617110693
A
license
A
quality
C
maintenance
Enables image analysis using any OpenAI-compatible vision API, supporting URLs, local files, or base64 input with custom prompts.
Last updated 2026-06-18
1
MIT
Vision MCP Server
Image & Video Processing AI & Machine Learning
Loveacup
A
license
A
quality
D
maintenance
Enables AI agents to analyze images, extract text, compare images, and analyze video through any OpenAI-compatible vision model.
Last updated 2026-02-09
4
66
13
MIT

View all related MCP servers

Related MCP Connectors

Frenchie
OCR, transcription, file extraction, and image generation for AI agents via MCP.
mcp
Generate images, GIFs, and PDFs from HTML, URLs, or templates — from your AI agent.
ScrapeGraphAI-scrapegraph-mcp
Enable language models to perform advanced AI-powered web scraping with enterprise-grade reliabili…

View all MCP Connectors

Latest Blog Posts

Who's Calling? MCP Hosts Are an Identity Blind Spot (And the Spec Knows It)
By Om-Shree-0709 on July 25, 2026.
mcp
Agent Identity
OAuth 2.1
Your AI Chatbot Just Exposed Your CEO's Salary to an Intern
By Om-Shree-0709 on July 2, 2026.
Agent Identity
MCP Security
OAuth Delegation
Why MCP Servers Need Execution Sandboxing (And Why Your Current Stack Isn't Enough)
By Om-Shree-0709 on June 30, 2026.
Agentic Ai
Prompt Injection
WebAssembly

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/karlcc/image_mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

Image Summarization MCP Server

Use Case

Features

Quick install from NPM

Configuration

Environment Variables

Command Line Arguments

Configuration Priority

Persistent Config

Verifying your model has vision

Usage

Host model vs vision backend

Z.AI / GLM example

Client routing snippet

MCP Tools

read_image_via_vision_backend

Parameters

Example Usage

compare_images_via_vision_backend

Parameters

Example Usage

get_config_info

Dev Setup

MCP Tool Installation (local dev build)

Testing

Running Tests

Model Benchmark (Accuracy + Latency)

Mock Server Testing

Real OpenAI-Compatible Server Testing

Manual Testing

API Reference

OpenAI-Compatible API Integration

Supported Image Formats

Error Handling

Development

Project Structure

Building

Testing

Release: tag and publish to npm

Dev cycle: four layers of vision detection

License

Contributing

Support

Tips

Maintenance

Resources

Looking for Admin?

Related MCP Servers

Vision MCP

Vision MCP Server

Image Parse MCP

Vision MCP Server

Related MCP Connectors

Latest Blog Posts

MCP directory API

`read_image_via_vision_backend`

`compare_images_via_vision_backend`

`get_config_info`