README.md•8.33 kB
# Gemini Image MCP Server
A Model Context Protocol (MCP) server for image generation and editing using Google Gemini AI. Supports optional context images to guide results and now includes a dedicated edit workflow. Optimized for creating eye‑catching social media images with square (1:1) format by default.
## Features
- ✨ Image generation with Google Gemini AI
- 🎨 Multiple aspect ratios (1:1, 16:9, 9:16, 4:3, 3:4)
- 📱 Optimized for social media with 1:1 format by default
- 🎯 Custom style support
- 🧩 Context images to guide generation
- ✏️ Dedicated edit tool for modifying existing assets without juggling extra options
- 🏷️ **Watermark support** - Overlay watermark images on generated results
- 💾 Automatic saving of images to local files
- 📁 Flexible output path configuration
- 🛡️ Customizable safety settings
## Installation
1. Clone this repository
2. Install dependencies:
```bash
npm install
```
3. Build the project:
```bash
npm run build
```
## Configuration
### Environment Variables
You need to configure your Google AI API key:
```bash
export GOOGLE_API_KEY="your-api-key-here"
```
### Getting Google AI API Key
1. Go to [Google AI Studio](https://makersuite.google.com/app/apikey)
2. Create a new API key
3. Copy the key and set it as an environment variable
## Client Configuration
```json
{
  "servers": {
    "gemini-image": {
      "command": "node",
      "args": ["/full/path/to/project/dist/index.js"],
      "env": {
        "GOOGLE_API_KEY": "your-api-key-here"
      }
    }
  }
}
```
## Command Line Interface
In addition to the MCP server, the project now ships with a CLI for quick terminal-friendly workflows.
1. Build the project once:
   ```bash
   npm run build
   ```
2. Make sure `GOOGLE_API_KEY` is set in your environment.
3. Explore the CLI:
   ```bash
   node dist/cli.js --help
   # or, after publishing/packing:
   gemini-image --help
   ```
### Commands
- `gemini-image generate`: Create new imagery from a text prompt.
  ```bash
  gemini-image generate --prompt "A banana astronaut on Mars" --output ./images/
  ```
- `gemini-image edit`: Apply instructions to an existing image.
  ```bash
  gemini-image edit --prompt "Add neon lights to the skyline" --input ./images/city.png
  ```
Both commands support `--help` for detailed, friendly option descriptions. CLI option names are intentionally concise (for example `--prompt`, `--context`, `--input`) so they are easier to memorize than the MCP tool identifiers.
## Available Tools
### `generate_image`
Creates a brand-new image from a text description, optionally using one or more images as visual context. Use this tool when you want to generate fresh content.
**Parameters:**
- `description` (string, required): Detailed description of the desired image.
- `images` (string[], optional): Array of image paths used as context (absolute or relative). Use this to “edit” or guide style/content.
- `aspectRatio` (string, optional): Orientation preset (`square`, `landscape`, `portrait`). Default: `square`.
- `style` (string, optional): Additional style (e.g., "minimalist", "colorful", "professional", "artistic").
- `outputPath` (string, optional): Where to save the image. If omitted, saves in current directory.
- `watermarkPath` (string, optional): Path to watermark image to overlay.
- `watermarkPosition` (string, optional): One of `top-left`, `top-right`, `bottom-left`, `bottom-right`. Default: `bottom-right`.
**Usage Examples:**
```
# Basic - saves to current directory
Generate an image of a mountain landscape at sunset with warm, minimalist style
```
```
# With context image to guide composition
Generate an image: "Create a futuristic city skyline inspired by this photo", images: ["./reference-skyline.jpg"], aspectRatio: "landscape"
```
```
# Multiple context images
Generate an image combining style of a logo and a photo, images: ["./photo.jpg", "./logo.png"], style: "professional"
```
When you request a specific orientation (`square`, `landscape`, or `portrait`), the server automatically appends an invisible helper image (`assets/square.png`, `assets/landscape.png`, or `assets/portrait.png`) so Gemini respects the target dimensions.
### `edit_image`
Modifies an existing image using a focused text instruction. This tool keeps the original framing unless you explicitly ask for structural changes.
**Parameters:**
- `description` (string, required): Instructions describing the edits to apply to the provided image.
- `image` (string, required): Path to the image file you want to edit (absolute or relative).
- `outputPath` (string, optional): Where to save the edited result. If omitted, the server uses the working directory and an auto-generated filename.
**Usage Examples:**
```
# Simple edit
Edit image: "Soften skin tones and remove flyaway hairs", image: "./headshot.png"
```
```
# Heavier retouch
Edit image: "Turn the product label red and add subtle sparkle highlights", image: "./product-shot.jpg"
```
```
# Custom path and watermark (top-left)
Generate an image of a space cat, outputPath: "./images/epic_pizza.png", watermarkPath: "./my_logo.png", watermarkPosition: "top-left"
```
## Watermark Functionality
The `generate_image` tool supports adding watermarks to your images:
**Features:**
- 🏷️ Add image watermarks to any generated output
- 📍 Position in any corner (`watermarkPosition`)
- 📏 Smart sizing (25% of image width, maintaining aspect ratio)
- 🎯 Consistent spacing (3% padding from edges)
- 🖼️ Supports PNG, JPG, WebP watermark files
- ⚡ Only applied when `watermarkPath` parameter is provided
**Usage:**
```bash
# For image generation
watermarkPath: "./my-brand-logo.png"
# With context images
watermarkPath: "./watermark.jpg"
```
**Watermark Specifications:**
- Position: Configurable corner via `watermarkPosition`
- Size: 25% of image width (maintains watermark aspect ratio)
- Padding: 3% of image width from the selected edges
- Blend mode: Over (watermark appears on top of image)
**Save Functionality:**
- Default: Images are saved in the directory from where the MCP client is executed
- Automatic naming: Generated based on description, date and time
- Supported formats: PNG, JPG, WebP (depending on what Gemini returns)
- Automatic creation: Creates necessary folders if they don't exist
## Development
### Available Scripts
- `npm run build`: Compiles TypeScript to JavaScript
- `npm run dev`: Development mode with automatic reload
- `npm start`: Runs the compiled server
- `npm run cli`: Runs the CLI entry directly (`node dist/cli.js`)
### Project Structure
```
gemini-image-mcp-server/
├── src/
│   ├── index.ts          # Main server entry point
│   ├── cli.ts            # CLI entry point (generate/edit commands)
│   ├── services/
│   │   ├── gemini.ts         # Gemini AI calls
│   │   ├── imageService.ts   # File system + watermark handling
│   │   └── serviceFactory.ts # Shared initialization helpers
│   ├── tools/
│   │   ├── index.ts      # Tools exports
│   │   ├── generateImage.ts  # Tool for creating new images
│   │   └── editImage.ts      # Tool for editing existing images
│   └── types/
│       └── index.ts      # Type definitions
├── dist/                 # Compiled files
├── package.json
├── tsconfig.json
└── README.md
```
## Troubleshooting
### Error: "GOOGLE_API_KEY environment variable is required"
Make sure you have configured the `GOOGLE_API_KEY` environment variable with your Google AI API key.
### Error: "Could not generate image"
- Verify that your API key is valid and has permissions for the `gemini-2.5-flash-image-preview` model
- Ensure the description doesn't contain content that might be blocked by safety filters
### File saving error
- Verify you have write permissions in the specified path
- Make sure the path is valid and accessible
- If specifying a folder, end it with `/`
### Server not responding
- Verify the server is running correctly
- Check logs in stderr for error messages
- Make sure the MCP client is configured correctly
## License
MIT
## Contributing
Contributions are welcome. Please open an issue before making significant changes.