Provides image description and object detection with bounding boxes using Gemini vision models through Google Cloud's Vertex AI, enabling AI agents to analyze images and locate objects within them.
Provides image description capabilities using OpenAI's vision models, enabling AI agents to analyze and understand images.
mcp-see
An MCP server that gives AI agents eyes - the ability to observe and understand images without stuffing raw pixels into their context window.
Features
Multi-provider vision: Describe images using Gemini, OpenAI, or Claude
Object detection: Find objects with bounding boxes (Gemini)
Hierarchical analysis: Detect regions, then zoom in for detail
Precise color extraction: K-Means clustering in LAB color space
Color naming: Human-readable color names via color.pizza API
Installation
Run directly from GitHub with npx:
Or clone and build locally:
MCP Client Configuration
Claude Desktop
Add to your Claude Desktop config (~/Library/Application Support/Claude/claude_desktop_config.json on macOS):
Other MCP Clients
The server runs on stdio transport. Configure your client to spawn npx github:simen/mcp-see.
Tools
describe
Get an AI-generated description of an image.
Input:
Example Output:
detect
Detect objects and return bounding boxes. Uses Gemini for native bbox support.
Input:
Example Output:
Coordinates are [ymin, xmin, ymax, xmax] normalized 0-1000.
describe_region
Crop to a bounding box and describe that region in detail.
Input:
Example Output:
analyze_colors
Extract dominant colors from a region using K-Means clustering in LAB color space.
Input:
Example Output:
The confidence field indicates color precision:
high: Flat colors (UI elements) - clusters are tightmedium: Mixed contentlow: Photographs/gradients - colors are approximate
Workflows
Hierarchical Image Understanding
The power of mcp-see is in combining tools for progressive analysis:
Design Reference Analysis
Extract implementation-ready specs from design mockups:
Environment Variables
Variable | Description | Required |
| GCP project ID for Vertex AI | For Gemini |
| OpenAI API key | For OpenAI provider |
| Anthropic API key | For Claude provider |
Gemini uses Google Cloud Application Default Credentials (ADC). Run gcloud auth application-default login to authenticate.
Technical Details
Color Extraction Algorithm
The analyze_colors tool uses K-Means clustering in LAB color space:
Convert pixels from RGB to LAB (perceptually uniform)
Subsample to 50k pixels for performance
K-Means++ initialization for better convergence
Cluster centroids become dominant colors
Convert back to RGB, name via color.pizza API
This approach groups perceptually similar colors together, working well for both flat UI colors and noisy photographs.
Bounding Box Format
All bounding boxes use [ymin, xmin, ymax, xmax] format with coordinates normalized to 0-1000. To convert to pixel coordinates:
License
MIT