Integrates with OpenAI's API to provide visual question-answering capabilities using GPT-4o-mini and GPT-4.1 models for compositional image understanding and knowledge-dependent visual reasoning
ViperMCP: A Model Context Protocol for Viper Server
ViperMCP is a mixture-of-experts (MoE) visual question-answering (VQA) server that defines several functions to solve 3 particular task areas: 1) visual grounding, 2) compositional image question answering, and 3) external knowledge-dependent image question answering. It is based heavily on the ViperGPT framework.
The MCP server is structured as a FastMCP streamable-http server and is therefore compatible with all of the client tooling provided by FastMCP.
Setup
OpenAI API Key
An API key for the OpenAI platform is required. It can either be set in the execution environment as OPENAI_API_KEY
, referenced by path in the OPENAI_API_KEY_PATH
environment variable, or passed as an http query parameter.
Ngrok Account (Optional)
Ngrok can be used to quickly deploy a locally-running server to a public facing URL. Create an account and run pip install ngrok
to use.
Assuming that you have followed one of the following installation procedures in the next section, running ngrok http 8000
will forward the public-facing URL to your ViperMCP server.
The address provided by ngrok (or any public facing address) can be used as a substitute for the local address (http://0.0.0.0:8000
) we will reference below.
Installation
Smithery Deployment
ViperMCP can be deployed through Smithery.
Dockerized FastMCP Server
Add your OpenAI API key to a file called api.key
. In the command below, point the mount
source to the location of the api.key
.
This will begin a CUDA-enabled docker container that can be accessed at http://0.0.0.0:8000/mcp/
.
Alternatively, you can use the docker-compose.yaml
file to build the image from source and run it. By default, it assumes that the OpenAI API key can be found in the same directory.
If your container provisioner (e.g., cloud provider) allows you to create environment variables and pass them to the container environment, you can also just set the OPENAI_API_KEY
variable pre-runtime.
Pure FastMCP Server
Clone the repository to your local device by running the following comand:
After cloning, we need to download the pretrained models and set our OpenAI API key. Run the following commands:
We then suggest creating a virtual environment (e.g., conda or venv) and activating it. This is not a requirement but is generally the best practice for managing Python packages. Then, install the requirements by running the follow commands.
This will install both the 3rd-party requirements as well the local viper
package that is used to standardize import locations.
We can now run our local FastMCP server using the follow command.
We should be able to access our server now at http://0.0.0.0:8000/mcp/
.
To utilize the OpenAI related models, we must pass the OpenAI API key to the following URL like:
http://0.0.0.0:8000/mcp?apiKey=sk-proj-XXXXXXXXXXXXXXXXXXXX
Usage
FastMCP Client
An example with passing base64-encoded byte-level image data. Image URLs can also be passed.
OpenAI API
Make sure to send the image URL as "type" : "input_text". Currently, the OpenAI API MCP integration cannot handle byte-level image data, so the image must be sent as a public URL.
Appendix
Models
The following models are used in the default version of ViperMCP:
- Grounding DINO
- SegmentAnything (SAM)
- GPT-4o-mini LLM
- GPT-4o-mini VLM
- GPT-4.1
- X-VLM
- Midas
- BERT
Warnings
This package generates and executes code on the machine in which it is run. We do not have any direct control over the code that is executed, and thus the prompting mechanism may be used to expose sensitive data. We have included basic injection prevention tools; however, this will not be sufficient to protect your data in a production environment.
If a production-level environment is your goal, we strongly suggest modifying the src/entrypoint.py
to define separate client wrappers using the same naming convention (i.e., find, simple_query, etc.) that forward requests to a backend server. Then, the mcp/server.py
should be modified to push requests to this client server, which then makes requests of the backend server. An example flow would be like the following:
Citations
Thank you to the team behind ViperGPT! Your framework and subsequent empirical successes have been invaluable in the creation of this project.
Contributions
If you'd like to contribute to the project, please pass the necessary tests (found in /tests
) and create a pull request.
This server cannot be installed
hybrid server
The server is able to function both locally and remotely, depending on the configuration or use case.
A mixture-of-experts visual question-answering server that enables visual grounding, compositional image question answering, and external knowledge-dependent image question answering through code generation and execution. Built on the ViperGPT framework with support for multiple computer vision models including Grounding DINO, SegmentAnything, and GPT-4o.
Related MCP Servers
- -securityAlicense-qualityA powerful server that integrates the Moondream vision model to enable advanced image analysis, including captioning, object detection, and visual question answering, through the Model Context Protocol, compatible with AI assistants like Claude and Cline.Last updated -17JavaScriptApache 2.0
- AsecurityAlicenseAqualityA MCP server that enables Claude and other MCP-compatible assistants to generate images from text prompts using Together AI's image generation models.Last updated -14TypeScriptMIT License
- -securityFlicense-qualityA server that connects to the xAI/Grok image generation API, allowing users to generate images from text prompts with support for multiple image generation and different response formats.Last updated -8JavaScript
- -securityFlicense-qualityThis server enables interaction with Google's Video Intelligence API for advanced video analysis, auto-generated using AG2's MCP builder to provide a standardized multi-agent interface.Last updated -Python