Enables AI-powered image and video analysis through Vertex AI, Google Cloud's machine learning platform for running Gemini models.
Provides storage integration for uploading and managing video files used in AI analysis operations.
Provides AI-powered image and video analysis using Google Gemini models, supporting multimodal content analysis through the Google AI Studio API.
Supports image analysis from Unsplash image URLs as demonstrated in the documentation examples.
Supports video analysis from YouTube URLs, enabling AI-powered content analysis of YouTube videos.
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@AI Vision MCP Serveranalyze this image and tell me what objects you see"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
AI Vision MCP Server
A powerful Model Context Protocol (MCP) server that provides AI-powered image and video analysis using Google Gemini and Vertex AI models.
Features
Dual Provider Support: Choose between Google Gemini API and Vertex AI
Multimodal Analysis: Support for both image and video content analysis
Flexible File Handling: Upload via multiple methods (URLs, local files, base64)
Storage Integration: Built-in Google Cloud Storage support
Comprehensive Validation: Zod-based data validation throughout
Error Handling: Robust error handling with retry logic and circuit breakers
TypeScript: Full TypeScript support with strict type checking
Quick Start
Pre-requisites
You could choose either to use google or vertex_ai. For simplicity, google provider is recommended.
Below are the environment variables you need to set based on your selected provider. (Note: It’s recommended to set the timeout configuration to more than 5 minutes for your MCP client).
(i) Using Google AI Studio Provider
Get your Google AI Studio's api key here
(ii) Using Vertex AI Provider
Refer to the guideline here on how to set this up.
Installation
Below are the installation guide for this MCP on different MCP clients, such as Claude Desktop, Claude Code, Cursor, Cline, etc.
Add to your Claude Desktop configuration:
(i) Using Google AI Studio Provider
(ii) Using Vertex AI Provider
(i) Using Google AI Studio Provider
(ii) Using Vertex AI Provider
Note: Increase the MCP startup timeout to 1 minutes and MCP tool execution timeout to about 5 minutes by updating ~\.claude\settings.json as follows:
Go to: Settings -> Cursor Settings -> MCP -> Add new global MCP server
Pasting the following configuration into your Cursor ~/.cursor/mcp.json file is the recommended approach. You may also install in a specific project by creating .cursor/mcp.json in your project folder. See Cursor MCP docs for more info.
(i) Using Google AI Studio Provider
(ii) Using Vertex AI Provider
Cline uses a JSON configuration file to manage MCP servers. To integrate the provided MCP server configuration:
Open Cline and click on the MCP Servers icon in the top navigation bar.
Select the Installed tab, then click Advanced MCP Settings.
In the cline_mcp_settings.json file, add the following configuration:
(i) Using Google AI Studio Provider
(ii) Using Vertex AI Provider
The server uses stdio transport and follows the standard MCP protocol. It can be integrated with any MCP-compatible client by running:
MCP Tools
The server provides four main MCP tools:
1) analyze_image
Analyzes an image using AI and returns a detailed description.
Parameters:
imageSource(string): URL, base64 data, or file path to the imageprompt(string): Question or instruction for the AIoptions(object, optional): Analysis options including temperature and max tokens
Examples:
Analyze image from URL:
Analyze local image file:
2) compare_images
Compares multiple images using AI and returns a detailed comparison analysis.
Parameters:
imageSources(array): Array of image sources (URLs, base64 data, or file paths) - minimum 2, maximum 4 imagesprompt(string): Question or instruction for comparing the imagesoptions(object, optional): Analysis options including temperature and max tokens
Examples:
Compare images from URLs:
Compare mixed sources:
3) detect_objects_in_image
Detects objects in an image using AI vision models and generates annotated images with bounding boxes. Returns detected objects with coordinates and either saves the annotated image to a file or temporary directory.
Parameters:
imageSource(string): URL, base64 data, or file path to the imageprompt(string): Custom detection prompt describing what to detect or recognize in the imageoutputFilePath(string, optional): Explicit output path for the annotated image
Configuration:
This function uses optimized default parameters for object detection and does not accept runtime options parameter. To customize the AI parameters (temperature, topP, topK, maxTokens), use environment variables:
File Handling Logic:
Explicit outputFilePath provided → Saves to the exact path specified
If not explicit outputFilePath → Automatically saves to temporary directory
Response Types:
Returns
fileobject when explicit outputFilePath is providedReturns
tempFileobject when explicit outputFilePath is not provided so the image file output is auto-saved to temporary folderAlways includes
detectionsarray with detected objects and coordinatesIncludes
summarywith percentage-based coordinates for browser automation
Examples:
Basic object detection:
Save annotated image to specific path:
Custom detection prompt:
4) analyze_video
Analyzes a video using AI and returns a detailed description.
Parameters:
videoSource(string): YouTube URL, GCS URI, or local file path to the videoprompt(string): Question or instruction for the AIoptions(object, optional): Analysis options including temperature and max tokens
Supported video sources:
YouTube URLs (e.g.,
https://www.youtube.com/watch?v=...)Local file paths (e.g.,
C:\Users\username\Downloads\video.mp4)
Examples:
Analyze video from YouTube URL:
Analyze local video file:
Note: Only YouTube URLs are supported for public video URLs. Other public video URLs are not currently supported.
Environment Configuration
For basic setup, you only need to configure the provider selection and required credentials:
Google AI Studio Provider (Recommended)
Vertex AI Provider (Production)
📖 Detailed Configuration Guide
For comprehensive environment variable documentation, including:
Complete configuration reference (60+ environment variables)
Function-specific optimization examples
Advanced configuration patterns
Troubleshooting guidance
👉 See Environment Variable Guide
Configuration Priority Overview
The server uses a hierarchical configuration system where more specific settings override general ones:
LLM-assigned values (runtime parameters in tool calls)
Function-specific variables (
TEMPERATURE_FOR_ANALYZE_IMAGE, etc.)Task-specific variables (
TEMPERATURE_FOR_IMAGE, etc.)Universal variables (
TEMPERATURE, etc.)System defaults
Basic Optimization:
Function-specific Optimization:
Model Selection:
Development
Prerequisites
Node.js 18+
npm or yarn
Setup
Scripts
npm run build- Build the TypeScript projectnpm run dev- Start development server with watch modenpm run lint- Run ESLintnpm run format- Format code with Prettiernpm start- Start the built server
Architecture
The project follows a modular architecture:
Error Handling
The server includes comprehensive error handling:
Validation Errors: Input validation using Zod schemas
Network Errors: Automatic retries with exponential backoff
Authentication Errors: Clear error messages for API key issues
File Errors: Handling for file size limits and format restrictions
Contributing
Fork the repository
Create a feature branch (
git checkout -b feature/amazing-feature)Commit your changes (
git commit -m 'Add amazing feature')Push to the branch (
git push origin feature/amazing-feature)Open a Pull Request
License
This project is licensed under the MIT License - see the LICENSE file for details.
Acknowledgments
Google for the Gemini and Vertex AI APIs
The Model Context Protocol team for the MCP framework
All contributors and users of this project