Skip to main content
Glama

ViperMCP

by ryansherby

🚀 ViperMCP: A Model Context Protocol for Viper Server

Mixture-of-Experts VQA, streaming-ready, and MCP-native.

Made with FastMCP ViperGPT Inspired GPU Ready License

ViperMCP is a mixture-of-experts (MoE) visual question‑answering (VQA) server that exposes streamable MCP tools for:

  • 🔎 Visual grounding

  • 🧩 Compositional image QA

  • 🌐 External knowledge‑dependent image QA

It’s built on the shoulders of 🐍 ViperGPT and delivered as a FastMCP HTTP server, so it works with all FastMCP client tooling.


✨ Highlights

  • MCP-native JSON‑RPC 2.0 endpoint (/mcp/) with streaming

  • 🧠 MoE routing across classic and modern VLMs/LLMs

  • 🧰 Two tools out of the box: viper_query (text) & viper_task (crops/masks)

  • 🐳 One‑command Docker or pure‑Python install

  • 🔐 Secure key handling via env var or secret mount


⚙️ Setup

🔑 OpenAI API Key

An OpenAI API key is required. Provide it via one of the following:

  • OPENAI_API_KEY (environment variable)

  • OPENAI_API_KEY_PATH (path to a file containing the key)

  • ?apiKey=... HTTP query parameter (for quick local testing)

🌐 Ngrok (Optional)

Use ngrok to expose your local server:

pip install ngrok ngrok http 8000

Use the ngrok URL anywhere you see http://0.0.0.0:8000 below.


🛠️ Installation

🐳 Option A: Dockerized FastMCP Server (GPU‑ready)

  1. Save your key to api.key, then run:

docker run -i --rm \ --mount type=bind,source=/path/to/api.key,target=/run/secrets/openai_api.key,readonly \ -e OPENAI_API_KEY_PATH=/run/secrets/openai_api.key \ -p 8000:8000 \ rsherby/vipermcp:latest

This starts a CUDA‑enabled container serving MCP at:

http://0.0.0.0:8000/mcp/

💡 Prefer building from source? Use the included docker-compose.yaml. By default it reads api.key from the project root. If your platform injects env vars, you can also set OPENAI_API_KEY directly.


🐍 Option B: Pure FastMCP Server (dev‑friendly)

git clone --recurse-submodules https://github.com/ryansherby/ViperMCP.git cd ViperMCP bash download-models.sh # Store your key for local dev echo YOUR_OPENAI_API_KEY > api.key # (recommended) activate a virtualenv / conda env pip install -r requirements.txt pip install -e . # run the server python run_server.py

Your server should be live at:

http://0.0.0.0:8000/mcp/

To use OpenAI‑backed models via query param:

http://0.0.0.0:8000/mcp?apiKey=sk-proj-XXXXXXXXXXXXXXXXXXXX

🧪 Usage

🤝 FastMCP Client Example

Pass images as base64 (shown) or as URLs:

image_path='./your_image.png' img_byte_arr = io.BytesIO() image.save(img_byte_arr, format='PNG') img_byte_arr.seek(0) image_bytes = img_byte_arr.read() img_b64_string = base64.b64encode(image_bytes).decode('utf-8') async with client: await client.ping() tools = await client.list_tools() # optional query = await client.call_tool( "viper_query", {"query": "how many muffins can each kid have for it to be fair?"}, {"image": f"data:image/png;base64,{img_b64_string}"}, ) task = await client.call_tool( "viper_task", {"task": "return a mask of all the people in the image"}, {"image": f"data:image/png;base64,{img_b64_string}"}, )

🧵 OpenAI API (MCP Integration)

The OpenAI MCP integration currently accepts image URLs (not raw base64). Send the URL as type: "input_text".

response = client.responses.create( model="gpt-4o", tools=[ { "type": "mcp", "server_label": "ViperMCP", "server_url": f"{server_url}/mcp/", "require_approval": "never", }, ], input=[ {"role": "system", "content": "Forward any queries or tasks relating to an image directly to the ViperMCP server."}, { "role": "user", "content": [ {"type": "input_text", "text": "based on this image, how many muffins can each kid have for it to be fair?"}, {"type": "input_text", "text": img_url}, ], }, ], )

🌐 Endpoints

🔓 HTTP GET Endpoints

GET /health => 'OK' (200) GET /device => {"device": "cuda"|"mps"|"cpu"} GET /mcp?apiKey= => 'Query parameters set successfully.'

🧠 MCP Client Endpoints (JSON‑RPC 2.0)

POST /mcp/

🔨 MCP Client Functions

viper_query(query, image) -> str # Returns a text answer to your query. viper_task(task, image) -> list[Image] # Returns a list of images (e.g., masks) satisfying the task.

🧩 Models (Default MoE Pool)

  • 🐊 Grounding DINO

  • ✂️ Segment Anything (SAM)

  • 🤖 GPT‑4o‑mini (LLM)

  • 👀 GPT‑4o‑mini (VLM)

  • 🧠 GPT‑4.1

  • 🔭 X‑VLM

  • 🌊 MiDaS (depth)

  • 🐝 BERT

🧭 The MoE router picks from these based on the tool & prompt.


⚠️ Security & Production Notes

This package may generate and execute code on the host. We include basic injection guards, but you must harden for production. A recommended architecture separates concerns:

MCP Server (Query + Image) => Client Server (Generate Code Request) => Backend Server (Generates Code) => Client Server (Executes Wrapper Functions) => Backend Server (Executes Underlying Functions) => Client Server (Return Result) => MCP Server (Respond)
  • 🧱 Isolate codegen & execution.

  • 🔒 Lock down secrets & file access.

  • 🧪 Add unit/integration tests around wrappers.


📚 Citations

Huge thanks to the ViperGPT team:

@article{surismenon2023vipergpt, title={ViperGPT: Visual Inference via Python Execution for Reasoning}, author={D'idac Sur'is and Sachit Menon and Carl Vondrick}, journal={arXiv preprint arXiv:2303.08128}, year={2023} }

🤝 Contributions

PRs welcome! Please:

  1. ✅ Ensure all tests in /tests pass

  2. 🧪 Add coverage for new features

  3. 📦 Keep docs & examples up to date


🧭 Quick Commands Cheat‑Sheet

# Run with Docker (mount key file) docker run -i --rm \ --mount type=bind,source=$(pwd)/api.key,target=/run/secrets/openai_api.key,readonly \ -e OPENAI_API_KEY_PATH=/run/secrets/openai_api.key \ -p 8000:8000 rsherby/vipermcp:latest # From source (after setup) python run_server.py # Hit health curl http://0.0.0.0:8000/health # List device curl http://0.0.0.0:8000/device # Use query param key (local only) curl "http://0.0.0.0:8000/mcp?apiKey=sk-proj-XXXX..."

💬 Questions?

Open an issue or start a discussion. We ❤️ feedback and ambitious ideas!

-
security - not tested
-
license - not tested
-
quality - not tested

hybrid server

The server is able to function both locally and remotely, depending on the configuration or use case.

A mixture-of-experts visual question-answering server that enables visual grounding, compositional image question answering, and external knowledge-dependent image question answering through code generation and execution. Built on the ViperGPT framework with support for multiple computer vision models including Grounding DINO, SegmentAnything, and GPT-4o.

  1. ✨ Highlights
    1. ⚙️ Setup
      1. 🔑 OpenAI API Key
      2. 🌐 Ngrok (Optional)
    2. 🛠️ Installation
      1. 🐳 Option A: Dockerized FastMCP Server (GPU‑ready)
      2. 🐍 Option B: Pure FastMCP Server (dev‑friendly)
    3. 🧪 Usage
      1. 🤝 FastMCP Client Example
      2. 🧵 OpenAI API (MCP Integration)
    4. 🌐 Endpoints
      1. 🔓 HTTP GET Endpoints
      2. 🧠 MCP Client Endpoints (JSON‑RPC 2.0)
      3. 🔨 MCP Client Functions
    5. 🧩 Models (Default MoE Pool)
      1. ⚠️ Security & Production Notes
        1. 📚 Citations
          1. 🤝 Contributions
            1. 🧭 Quick Commands Cheat‑Sheet
              1. 💬 Questions?

            Related MCP Servers

            • -
              security
              -
              license
              -
              quality
              A powerful server that integrates the Moondream vision model to enable advanced image analysis, including captioning, object detection, and visual question answering, through the Model Context Protocol, compatible with AI assistants like Claude and Cline.
              Last updated -
              18
              Apache 2.0
            • A
              security
              -
              license
              A
              quality
              A MCP server that enables Claude and other MCP-compatible assistants to generate images from text prompts using Together AI's image generation models.
              Last updated -
              4
              MIT License
              • Apple
              • Linux
            • -
              security
              -
              license
              -
              quality
              A server that connects to the xAI/Grok image generation API, allowing users to generate images from text prompts with support for multiple image generation and different response formats.
              Last updated -
              8
            • -
              security
              -
              license
              -
              quality
              This server enables interaction with Google's Video Intelligence API for advanced video analysis, auto-generated using AG2's MCP builder to provide a standardized multi-agent interface.
              Last updated -

            View all related MCP servers

            MCP directory API

            We provide all the information about MCP servers via our MCP API.

            curl -X GET 'https://glama.ai/api/mcp/v1/servers/ryansherby/ViperMCP'

            If you have feedback or need assistance with the MCP directory API, please join our Discord server