Skip to main content
Glama

cv-mcp

Minimal MCP server focused on computer vision: image recognition and metadata generation via OpenRouter (Gemini 2.5 family).

Goals

  • Keep it tiny and composable

  • Single tool: caption an image via URL or local file

  • No DB or app logic

Structure

  • src/cv_mcp/captioning/openrouter_client.py – image analysis client

  • src/cv_mcp/metadata/ – prompts, JSON schema, and pipeline runner

  • src/cv_mcp/mcp_server.py – MCP server exposing tools

  • cli/caption_image.py – optional CLI to test captioning locally

Env vars

  • OPENROUTER_API_KEY

Dotenv

  • Put OPENROUTER_API_KEY in a local .env file (see .env.example).

  • CLI scripts and the MCP server auto-load .env if present.

Install

  • pip install -e . (or pip install .)

⚠️ Development Note: If you have the package installed via pip install, uninstall it before working with the local development version to avoid import conflicts. Use pip uninstall cv-mcp first, then run commands directly from the repo directory.

Run MCP server (stdio)

  • Console script: cv-mcp-server (provides an MCP stdio server)

  • Configure your MCP client to launch cv-mcp-server.

MCP integration (Claude Desktop)

  • Add to Claude Desktop config (see their docs for the config location): { "mcpServers": { "cv-mcp": { "command": "cv-mcp-server", "env": { "OPENROUTER_API_KEY": "sk-or-..." } } } }

  • After saving, restart Claude Desktop and enable the tool.

Tools

  • caption_image: one-off caption (kept for compatibility)

  • alt_text: short alt text (<= 20 words)

  • dense_caption: detailed 2–6 sentence caption

  • image_metadata: structured JSON metadata with alt + caption. Params:

    • mode: double (default) uses 2 calls: vision (alt+caption) + text-only (metadata). triple uses vision for both steps.

    • caption_override: supply your own dense caption; skips the vision caption step.

MCP tool reference

  • Server: cv-mcp (stdio)

  • caption_image(image_url|file_path, prompt?, backend?, local_model_id?) -> string

  • alt_text(image_url|file_path, max_words?) -> string

  • dense_caption(image_url|file_path) -> string

  • image_metadata(image_url|file_path, caption_override?, config_path?) -> { alt_text, caption, metadata }

Examples

  • MCP call (OpenRouter): {"image_url": "https://example.com/image.jpg"}

  • MCP call (local): {"backend": "local", "file_path": "./image.jpg"}

Quick test (CLI)

  • URL: python cli/caption_image.py --image-url https://example.com/img.jpg

  • File: python cli/caption_image.py --file-path ./image.png

Metadata pipeline (CLI)

  • Double (default):

    • python cli/image_metadata.py --image-url https://example.com/img.jpg --mode double

    • Local alt+caption (still requires OpenRouter for metadata):

      • python cli/image_metadata.py --image-url https://example.com/img.jpg --mode double --ac-backend local

  • Triple (vision metadata):

    • python cli/image_metadata.py --image-url https://example.com/img.jpg --mode triple

    • Fully local (no OpenRouter required):

      • python cli/image_metadata.py --image-url https://example.com/img.jpg --mode triple --ac-backend local --meta-vision-backend local

  • With existing caption (skips the caption step):

    • python cli/image_metadata.py --image-url https://example.com/img.jpg --caption-override "<dense caption>" --mode double

  • Custom model config (JSON with caption_model, metadata_text_model, metadata_vision_model):

    • python cli/image_metadata.py --image-url https://example.com/img.jpg --config-path ./my_models.json --mode double

Schema & vocab

  • JSON schema (lean): src/cv_mcp/metadata/schema.json

  • Controlled vocab (non-binding reference): src/cv_mcp/metadata/vocab.json

Global config

  • Root file: cv_mcp.config.json (auto-detected from project root / CWD)

  • Env override: set CV_MCP_CONFIG=/path/to/config.json

  • Keys (renamed for clarity):

    • caption_model: vision model for alt+caption (OpenRouter)

    • metadata_text_model: text model for metadata (double mode)

    • metadata_vision_model: vision model for metadata (triple mode)

    • caption_backend: openrouter (default) or local for alt/dense/AC steps

    • metadata_vision_backend: openrouter (default) or local for triple mode

    • local_vlm_id: default local VLM (e.g. Qwen/Qwen2.5-VL-7B-Instruct)

    • Backwards-compat: legacy keys (ac_model, meta_text_model, meta_vision_model, ac_backend, meta_vision_backend, local_model_id) are still accepted.

  • Packaged defaults still live at src/cv_mcp/metadata/config.json and are used if no root config is found.

  • You can still provide a custom config file per-call via --config-path or the config_path tool param.

Local backends (optional)

  • Install optional deps: pip install .[local]

  • Global default: set "caption_backend": "local" (and optionally "metadata_vision_backend": "local") in cv_mcp.config.json

  • Use with MCP: pass backend: "local" in the tool params (overrides global)

  • Use with CLI: add --backend local and optionally --local-model-id Qwen/Qwen2-VL-2B-Instruct (overrides global)

  • Requires a locally available model (default: Qwen/Qwen2-VL-2B-Instruct via HF cache)

  • Or run without transformers using Ollama (no Python ML deps):

    • Install and run Ollama; pull a vision model (e.g., ollama pull qwen2.5-vl)

    • Use backend ollama and set models in the config (e.g., caption_model: "qwen2.5-vl")

    • CLI example (triple, fully local):

      • python cli/image_metadata.py --image-url https://... --mode triple --caption-backend ollama --metadata-vision-backend ollama --config-path ./configs/triple_ollama_qwen.json

    • Configure host with --ollama-host http://localhost:11434 if not default

Per-call overrides (CLI)

  • Metadata CLI now supports per-call backend overrides without editing global config:

    • --caption-backend local|openrouter|ollama (legacy: --ac-backend)

    • --metadata-vision-backend local|openrouter|ollama (legacy: --meta-vision-backend)

    • --local-vlm-id Qwen/Qwen2.5-VL-7B-Instruct (legacy: --local-model-id)

    • --ollama-host http://localhost:11434

Justfile tasks

  • A Justfile provides quick test scenarios. Use URL-only inputs, e.g. just double_flash https://example.com/img.jpg.

  • Scenarios included:

    • double_flash: Gemini 2.5 Flash for both steps

    • double_pro: Gemini 2.5 Pro for both steps

    • double_mixed_pro_text: Flash for vision alt+caption, Pro for text metadata (recommended mix for JSON reliability)

    • triple_flash / triple_pro: Flash/Pro for both vision steps

    • double_qwen_local <url> <qwen_id>: Local Qwen 2.5 VL for vision step, Pro for text metadata

    • triple_qwen_local <url> <qwen_id>: Fully local Qwen 2.5 VL for both vision steps

    • Convenience (no extra args):

      • double_qwen2b_local <url> / triple_qwen2b_local <url>

      • double_qwen7b_local <url> / triple_qwen7b_local <url>

Recommendation for mixed double

  • Put Gemini 2.5 Pro on the text metadata step and Flash on the vision alt+caption step. The metadata step benefits from better structured-JSON compliance and reasoning, while Flash keeps latency/cost down for the vision caption.

  • OpenRouter key requirements:

    • Double mode always requires OPENROUTER_API_KEY (text LLM for metadata).

    • Triple mode requires OPENROUTER_API_KEY unless both --ac-backend local and --meta-vision-backend local are set.

Examples

  • MCP tool (local): {"backend": "local", "file_path": "./image.jpg"}

  • CLI (local): python cli/caption_image.py --file-path ./image.jpg --backend local

Troubleshooting

  • 401/403 from OpenRouter: ensure OPENROUTER_API_KEY is set and valid.

  • Model selection: prefer cv_mcp.config.json at project root; or pass --config-path.

  • Large images: remote images are downloaded and sent as base64; ensure the URL is accessible.

  • Local backend: install optional deps pip install .[local] and ensure model is present/cached.

Changelog

  • See docs/CHANGELOG.md for notable changes and release notes.

One-click Deploy
A
security – no known vulnerabilities
F
license - not found
A
quality - confirmed to work

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/samhains/cv-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server