Integrates with OpenAI's API to provide visual question-answering capabilities using GPT-4o-mini and GPT-4.1 models for compositional image understanding and knowledge-dependent visual reasoning
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@ViperMCPhow many people are in this image and what are they doing?"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
🚀 ViperMCP: A Model Context Protocol for Viper Server
Mixture-of-Experts VQA, streaming-ready, and MCP-native.
ViperMCP is a mixture-of-experts (MoE) visual question‑answering (VQA) server that exposes streamable MCP tools for:
🔎 Visual grounding
🧩 Compositional image QA
🌐 External knowledge‑dependent image QA
It’s built on the shoulders of 🐍 ViperGPT and delivered as a FastMCP HTTP server, so it works with all FastMCP client tooling.
✨ Highlights
⚡ MCP-native JSON‑RPC 2.0 endpoint (
/mcp/) with streaming🧠 MoE routing across classic and modern VLMs/LLMs
🧰 Two tools out of the box:
viper_query(text) &viper_task(crops/masks)🐳 One‑command Docker or pure‑Python install
🔐 Secure key handling via env var or secret mount
Related MCP server: Together AI Image Server
⚙️ Setup
🔑 OpenAI API Key
An OpenAI API key is required. Provide it via one of the following:
OPENAI_API_KEY(environment variable)OPENAI_API_KEY_PATH(path to a file containing the key)?apiKey=...HTTP query parameter (for quick local testing)
🌐 Ngrok (Optional)
Use ngrok to expose your local server:
Use the ngrok URL anywhere you see http://0.0.0.0:8000 below.
🛠️ Installation
🐳 Option A: Dockerized FastMCP Server (GPU‑ready)
Save your key to
api.key, then run:
This starts a CUDA‑enabled container serving MCP at:
💡 Prefer building from source? Use the included
docker-compose.yaml. By default it readsapi.keyfrom the project root. If your platform injects env vars, you can also setOPENAI_API_KEYdirectly.
🐍 Option B: Pure FastMCP Server (dev‑friendly)
Your server should be live at:
To use OpenAI‑backed models via query param:
🧪 Usage
🤝 FastMCP Client Example
Pass images as base64 (shown) or as URLs:
🧵 OpenAI API (MCP Integration)
The OpenAI MCP integration currently accepts image URLs (not raw base64). Send the URL as type: "input_text".
🌐 Endpoints
🔓 HTTP GET Endpoints
🧠 MCP Client Endpoints (JSON‑RPC 2.0)
🔨 MCP Client Functions
🧩 Models (Default MoE Pool)
🐊 Grounding DINO
✂️ Segment Anything (SAM)
🤖 GPT‑4o‑mini (LLM)
👀 GPT‑4o‑mini (VLM)
🧠 GPT‑4.1
🔭 X‑VLM
🌊 MiDaS (depth)
🐝 BERT
🧭 The MoE router picks from these based on the tool & prompt.
⚠️ Security & Production Notes
This package may generate and execute code on the host. We include basic injection guards, but you must harden for production. A recommended architecture separates concerns:
🧱 Isolate codegen & execution.
🔒 Lock down secrets & file access.
🧪 Add unit/integration tests around wrappers.
📚 Citations
Huge thanks to the ViperGPT team:
🤝 Contributions
PRs welcome! Please:
✅ Ensure all tests in
/testspass🧪 Add coverage for new features
📦 Keep docs & examples up to date
🧭 Quick Commands Cheat‑Sheet
💬 Questions?
Open an issue or start a discussion. We ❤️ feedback and ambitious ideas!