Provides AI-powered image analysis using GPT-4O Vision API and image generation capabilities using DALL-E 2, DALL-E 3, and GPT-Image-1 models. Supports image description, content analysis, comparison, generation from text prompts, image editing, and creating variations.
AI Image MCP Server
A comprehensive Model Context Protocol (MCP) server that provides both AI-powered image analysis and AI image generation capabilities using OpenAI's Vision API and image generation models.
System Requirements
Tested on:
macOS 14.3.0 (Darwin 23.3.0, ARM64)
Python 3.13.0
uv 0.7.13
OpenAI API access
Features
š Image Analysis & Description
Smart Image Analysis: Analyze images using OpenAI's GPT-4O Vision model
Targeted Analysis: Analyze specific aspects (objects, text, colors, composition, emotions)
Image Comparisons: Compare two images and highlight similarities/differences
Metadata Extraction: Get technical information about image files
Intelligent Caching: Cache analysis results to avoid repeated API calls
Multiple Formats: Support for PNG, JPEG, GIF, and WebP formats
šØ Image Generation & Editing
Text-to-Image Generation: Create images from text prompts using DALL-E 2, DALL-E 3, or GPT-Image-1
Image Editing: Edit existing images with text prompts using GPT-Image-1 or DALL-E 2
Image Variations: Create variations of existing images using DALL-E 2
Flexible Output: Save generated images locally with custom naming and directories
Model Support: Full support for all OpenAI image generation models with their specific features
MCP Tools
describe_image(image_path, prompt)- Get detailed image descriptionsanalyze_image_content(image_path, analysis_type)- Analyze specific aspectscompare_images(image1_path, image2_path, comparison_focus)- Compare two imagesget_image_metadata(image_path)- Extract technical metadataget_cache_info()- View cache statisticsclear_image_cache()- Clear cached results
Installation
Install dependencies:
Set your OpenAI API key:
Run the server:
Running the Server
MCP Integration
Claude Desktop
Cursor
Configure MCP in Cursor settings:
Analysis Types
general: Overall image descriptionobjects: Object detection and identificationtext: Text extraction and OCRcolors: Color analysis and palettecomposition: Visual composition and layoutemotions: Emotional content and mood
Project Structure
Caching
Automatic file change detection via SHA-256 hashes
30-day cache expiration
Separate cache entries for different prompts/analysis types
Significant performance improvements (1000x+ faster than API calls)
Available Tools
Image Analysis Tools
describe_image
Analyze an image and provide a detailed description.
Parameters:
image_path(str): Path to the image fileprompt(str, optional): Custom analysis prompt
Supports: PNG, JPEG, GIF, WebP
Features: Caching, file validation, comprehensive error handling
analyze_image_content
Perform targeted analysis of specific image aspects.
Parameters:
image_path(str): Path to the image fileanalysis_type(str): Type of analysis - "general", "objects", "text", "colors", "composition", "emotions"
Features: Specialized prompts for different analysis types
compare_images
Compare two images and highlight similarities and differences.
Parameters:
image1_path(str): Path to first imageimage2_path(str): Path to second imagecomparison_focus(str): What to focus on in comparison
get_image_metadata
Get technical metadata about an image file.
Returns: File size, dimensions, format, color mode, aspect ratio, etc.
Image Generation Tools
generate_image
Generate images from text prompts using OpenAI's image generation models.
Parameters:
prompt(str): Text description of desired imagemodel(str): "dall-e-2", "dall-e-3", or "gpt-image-1" (default: dall-e-3)size(str, optional): Image dimensions (varies by model)quality(str, optional): Quality setting (varies by model)style(str, optional): "vivid" or "natural" (DALL-E 3 only)n(int, optional): Number of images (1-10, DALL-E 3 only supports 1)output_dir(str): Directory to save images (default: "./generated_images")filename_prefix(str): Prefix for filenames (default: "generated")
Model-Specific Features:
DALL-E 2: Basic generation, sizes: 256x256, 512x512, 1024x1024
DALL-E 3: High quality, styles (vivid/natural), sizes: 1024x1024, 1792x1024, 1024x1792
GPT-Image-1: Advanced features, transparency support, compression control
edit_image
Edit existing images using text prompts.
Parameters:
image_path(str): Path to image to editprompt(str): Description of desired editmask_path(str, optional): Path to mask image (PNG with transparent edit areas)model(str): "gpt-image-1" or "dall-e-2" (default: gpt-image-1)size,quality,n: Model-specific optionsoutput_dir,filename_prefix: Output configuration
Supported Models: GPT-Image-1 (up to 16 images, 50MB each) and DALL-E 2 (1 square PNG, 4MB max)
create_image_variations
Create variations of existing images using DALL-E 2.
Parameters:
image_path(str): Path to source image (must be square PNG, <4MB)n(int): Number of variations (1-10, default: 2)size(str): Variation size - "256x256", "512x512", "1024x1024"output_dir,filename_prefix: Output configuration
list_generated_images
List all generated images in a directory with metadata.
Parameters:
directory(str): Directory to scan (default: "./generated_images")
Returns: File listing with sizes, dimensions, modification dates
Cache Management Tools
get_cache_info
Get information about the analysis cache (file count, size, location).
clear_image_cache
Clear all cached analysis results.
Model Comparison
Feature | DALL-E 2 | DALL-E 3 | GPT-Image-1 |
Generation | ā Basic | ā High Quality | ā Advanced |
Editing | ā Limited | ā | ā Advanced |
Variations | ā | ā | ā |
Max Images | 10 | 1 | 10 |
Sizes | 256x256, 512x512, 1024x1024 | 1024x1024, 1792x1024, 1024x1792 | 1024x1024, 1536x1024, 1024x1536 |
Styles | ā | vivid, natural | ā |
Quality | standard | standard, hd | auto, high, medium, low |
Transparency | ā | ā | ā |
Max Prompt | 1000 chars | 4000 chars | 32000 chars |
Usage Examples
Generate a Basic Image
Edit an Existing Image
Create Image Variations
Analyze Generated Images
File Organization
Generated images are automatically organized in separate directories:
./generated_images/- Text-to-image generations./edited_images/- Image edits./image_variations/- Image variations
Files are named with timestamps to avoid conflicts:
generated_1234567890_1.pngedited_1234567890_1.pngvariation_1234567890_1.png
Error Handling
The server includes comprehensive error handling for:
Invalid image formats and file paths
Model-specific parameter validation
File size and dimension limits
API quota and rate limiting
Network connectivity issues
Malformed prompts and parameters
Cache System
The analysis tools use an intelligent caching system:
File Change Detection: Uses SHA-256 hashes to detect file changes
30-Day Expiration: Automatically expires old cache entries
Safe Operation: Cache failures don't affect main functionality
Efficient Storage: Uses MD5 hashes for safe cache key generation
Requirements
Python 3.13+
OpenAI API key with access to Vision API and Image Generation
Required packages:
mcp[cli]>=1.9.4,openai>=1.90.0,pillow>=11.2.1,requests>=2.32.4
License
This project is licensed under the MIT License - see the LICENSE file for details.