Skip to main content
Glama
ZRZRING

low-hallucination-vision

by ZRZRING

Low-Hallucination Vision Toolkit

A drop-in replacement for high-hallucination vision MCPs (like the default analyze_image), built on top of your own OpenAI-compatible multimodal model (mimo v2.5). Two pieces that work together:

vision-mcp/      ← MCP server (Python). The engine. Plug into any agent.
vision-skill/    ← Skill (SKILL.md). The cross-verification workflow. Plug into ZCode.

Why this is lower-hallucination than a generic VLM call

It's not magic — it's five boring disciplines, all in the MCP layer:

  1. Mode-routed prompts — UI / general / OCR / detect each get a tightly scoped system prompt instead of one "describe everything" prompt.

  2. Forced structured JSON — every claim is an object with confidence.

  3. Low temperature (0.2 default) — less creative completion.

  4. "Allowed to be ignorant" — prompts explicitly forbid common-sense completion of details not actually visible.

  5. Confidence gating — the MCP reflags any claim below threshold as "_flag": "存疑", so the agent can't accidentally report it as fact.

The Skill adds a sixth layer on top: cross-verification (run two independent modes and only trust claims both agree on).


Related MCP server: image-tiler-mcp-server

Setup (uv-managed environment)

This project uses uv for environment management. uv creates an isolated .venv per project and pins the Python version, so nothing pollutes your global Python. The .venv is what VSCode auto-detects.

1. Install dependencies & create the venv

cd C:\Users\zrzring\ZCodeProject\vision-mcp
uv sync

That single command:

  • reads .python-version (3.12) and auto-downloads that Python if missing,

  • creates vision-mcp\.venv,

  • installs everything in pyproject.toml (currently mcp[cli]).

To add a package later: uv add <pkg>. To rebuild after pulling the repo: just uv sync again. Never use raw pip here — it would install into the wrong place.

Verify it works:

uv run python -c "import main; print('OK', main.mcp.name)"
# → OK low-hallucination-vision

2. Configure API credentials

copy .env.example .env          # then edit .env
VISION_API_BASE=https://api.mimo.example.com/v1   # your OpenAI-compatible endpoint
VISION_API_KEY=sk-...
VISION_MODEL=mimo-vl-2.5
VISION_TEMPERATURE=0.2

3. Make VSCode detect the venv

VSCode's Python extension auto-detects .venv in the workspace. To be safe:

  1. Open the folder C:\Users\zrzring\ZCodeProject (not the single file) in VSCode.

  2. Install the Python extension (ms-python.python) if not already.

  3. Ctrl+Shift+PPython: Select Interpreter → pick the one shown as Python 3.12.13 ('.venv') under vision-mcp\.venv\Scripts\python.exe.

If it doesn't show up, force it with a workspace setting — create .vscode/settings.json in the project root:

{
  "python.defaultInterpreterForWorkspace": "vision-mcp\\.venv\\Scripts\\python.exe",
  "python.terminal.activateEnvironment": true
}

Now any terminal you open in VSCode auto-activates .venv, and you get autocomplete / type-checking for mcp and your code.

4. Register the MCP server with your agents

The server speaks stdio MCP. Use uv run to launch it — this guarantees the project's .venv is used regardless of the agent's working directory:

Claude Code~/.claude.json (or project .mcp.json):

{
  "mcpServers": {
    "low-hallucination-vision": {
      "command": "uv",
      "args": ["run", "--directory",
                "C:\\Users\\zrzring\\ZCodeProject\\vision-mcp",
                "python", "main.py"],
      "env": {
        "VISION_API_BASE": "https://api.mimo.example.com/v1",
        "VISION_API_KEY": "sk-...",
        "VISION_MODEL": "mimo-vl-2.5"
      }
    }
  }
}

OpenCodeopencode.json:

{
  "mcp": {
    "low-hallucination-vision": {
      "type": "local",
      "command": ["uv", "run", "--directory",
                   "C:\\Users\\zrzring\\ZCodeProject\\vision-mcp",
                   "python", "main.py"],
      "environment": {
        "VISION_API_BASE": "https://api.mimo.example.com/v1",
        "VISION_API_KEY": "sk-...",
        "VISION_MODEL": "mimo-vl-2.5"
      }
    }
  }
}

ZCode — same mcpServers shape as Claude Code.

Why uv run --directory instead of a bare python? Because the agent may launch the server from any working directory; uv run --directory always activates the right .venv. Environment variables can live in the config (as above) OR in vision-mcp/.env — either works.


Alternative: build a standalone vision-mcp.exe

If you'd rather not depend on uv/Python at runtime, package the server into a single executable with PyInstaller. The exe is self-contained (~24 MB), needs no Python installed, and works on any machine when shipped with its .env. It runs in two modes: a stdio MCP server (default) and a command-line image tool.

Build it

pyinstaller is already in pyproject.toml, so after uv sync:

cd C:\Users\zrzring\ZCodeProject\vision-mcp
uv run pyinstaller --onefile --name vision-mcp --collect-all mcp --clean --noconfirm main.py

Output lands in dist\vision-mcp.exe. The vision-mcp.spec file is auto-generated; you can re-run pyinstaller vision-mcp.spec --noconfirm after that for identical builds.

Put it on PATH and configure

  1. Copy the exe and your .env to a directory already on PATH (e.g. C:\Users\<you>\.local\bin):

    copy dist\vision-mcp.exe  C:\Users\<you>\.local\bin\
    copy .env                 C:\Users\<you>\.local\bin\
  2. The exe reads .env from its own directory first, then the source dir, then the working dir. So keep .env next to the exe — change key/endpoint there, no rebuild needed.

  3. Verify from anywhere:

    vision-mcp --help
    vision-mcp analyze C:\path\to\pic.png --mode general --prompt "describe it"

Register the exe with agents

Because the exe defaults to MCP-server mode, agent config is minimal — no uv run, no args, no env block (creds come from the exe's .env):

Claude Code~/.claude.json (or project .mcp.json):

{
  "mcpServers": {
    "low-hallucination-vision": {
      "command": "vision-mcp"
    }
  }
}

OpenCodeopencode.json:

{
  "mcp": {
    "low-hallucination-vision": {
      "type": "local",
      "command": ["vision-mcp"]
    }
  }
}

If vision-mcp isn't on PATH for the agent, use the full path instead: "command": "C:\\Users\\<you>\\.local\\bin\\vision-mcp.exe".

CLI mode (use it directly, no agent)

The same exe doubles as a terminal image tool:

vision-mcp                                    # = MCP server (default)
vision-mcp mcp                                #   "    (explicit)
vision-mcp analyze <image> [--mode general|ui_screenshot|ocr|detect] [--prompt "..."]
vision-mcp ocr      <image> [--prompt "..."]
vision-mcp detect   <image> [--prompt "..."]

<image> is a local path or an http(s) URL. Output is the same JSON the MCP tools return (with bbox normalization + confidence flagging applied).

Source vs exe — which to use? Source (uv run) is best while developing (edit main.py, reload instantly). The exe is best for daily use and sharing to other machines — no Python toolchain needed.

3. (Optional) Register the Skill with ZCode

Copy or symlink vision-skill/ into your skills directory so the cross-verification workflow is auto-loaded:

<skills-dir>/low-hallucination-vision/SKILL.md

The Skill is agent-agnostic in content but only ZCode auto-discovers Skills. For Claude Code / OpenCode, the MCP tools alone still work — just keep the Skill's workflow in mind (or paste the relevant section into your own prompt).


Tools provided

Tool

What it does

When to use

analyze_image(image_source, mode, prompt, temperature)

Structured analysis; mode = general / ui_screenshot / ocr / detect

Default entry point

ocr_extract(image_source, prompt, temperature)

Text-only extraction

When you only need words

detect_elements(image_source, prompt, temperature)

Object detection with mandatory bbox

When you need locations

All three return JSON. Claims below VISION_CONFIDENCE_THRESHOLD (default 0.6) are tagged "_flag": "存疑".

image_source accepts either a local file path or an http(s) URL.


File map

ZCodeProject/
├── vision-mcp/                ← uv project (this README lives here)
│   ├── main.py                ← the MCP server + CLI (engine + anti-hallucination)
│   ├── pyproject.toml         ← deps: mcp[cli], pyinstaller
│   ├── uv.lock                ← pinned versions (auto-generated)
│   ├── .python-version        ← 3.12 (uv auto-downloads it)
│   ├── .env.example           ← copy to .env and fill in
│   ├── vision-mcp.spec        ← auto-generated by PyInstaller (for rebuilds)
│   ├── .venv/                 ← created by `uv sync` (gitignored)
│   ├── build/                 ← PyInstaller intermediates (gitignored)
│   └── dist/
│       └── vision-mcp.exe     ← the standalone exe (built, gitignored)
└── .agents/                   ← skill(s) discovered by ZCode
    └── skills/vision-skill/
        └── SKILL.md           ← cross-verification workflow for the agent

Tuning

  • Still too much hallucination? Lower VISION_TEMPERATURE to 0.1 and raise VISION_CONFIDENCE_THRESHOLD to 0.7.

  • Missing real things (over-conservative)? Lower the threshold to 0.5 and raise temperature slightly to 0.3.

  • Model keeps breaking JSON? Some VLMs ignore schema instructions; in that case the tool returns "_parse_error": true with the raw text so you can post-process. Consider switching to a model with stronger JSON support.

F
license - not found
-
quality - not tested
C
maintenance

Maintenance

Maintainers
Response time
Release cycle
Releases (12mo)
Commit activity

Resources

Unclaimed servers have limited discoverability.

Looking for Admin?

If you are the server author, to access and configure the admin panel.

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/ZRZRING/vision-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server