gemini-image-mcp
This server lets you generate, edit, and process images using Google Gemini AI, plus perform free local image transformations — all from an MCP client.
generate_image (AI-powered, uses Gemini API)
Text-to-image generation — describe an image in a prompt and get a saved file
Image editing — provide reference image paths plus an instruction to edit or remix them
Multi-turn iterative editing — pass a
sessionIdto refine images across multiple calls, keeping prior turns in contextMulti-image input — use up to 14 reference images at once
Background removal in one call — get a transparent PNG cutout using AI matte, chroma key, or white-threshold keying
Aspect ratio & resolution control — supports common ratios (1:1, 16:9, 9:16, etc.) and 1K/2K/4K output
Reproducible generation — use an integer
seedfor consistent resultsGoogle Search grounding — real-world accuracy on supported models
Cost & usage reporting — token counts, estimated USD cost, and session/hour totals per response
Rate limiting — configurable per-hour caps on requests and cost
Automatic model discovery — detects available image models from your API key at startup
Organized output — custom filenames, subfolders, auto-versioning, and a
generations.jsonlmanifest log
process_image (Local, free, no API calls)
Crop — pixel-exact dimensions, aspect ratio (center), or smart focal point (attention/entropy strategies)
Resize — to a target width, height, or both while maintaining aspect ratio
Background removal — AI semantic matte, white/light threshold keying, or chroma key (green screen / any solid color with HSV keying, feathering, and spill suppression)
Trim — auto-remove whitespace or transparent borders
Format conversion — convert between PNG, JPEG, and WebP with quality control (1–100)
Provides tools for generating, editing, and processing images using Google Gemini's API, including text-to-image, multi-turn editing, and local image processing.
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@gemini-image-mcpGenerate an image of a samurai standing under cherry blossoms"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
gemini-image-mcp
A simple, focused MCP server for Google Gemini's native image generation — the "Nano Banana" models. Generate, edit, and locally process images from Claude Code, Claude Desktop, or any stdio-based MCP client. Two tools, no bloat.
Built for agents: a single call returns a saved image — or, with one-call background removal, a ready-to-use transparent PNG — without streaming image data through your agent's context. Uses Gemini's generateContent API (not the deprecated Imagen API).
Install
npm install -g @jimothy-snicket/gemini-image-mcpOr use directly with npx:
npx -y @jimothy-snicket/gemini-image-mcpClaude Code (one command):
claude mcp add gemini-image -- npx -y @jimothy-snicket/gemini-image-mcpRequires a GEMINI_API_KEY environment variable — see Setup for details.
Set up a config file (optional):
npx @jimothy-snicket/gemini-image-mcp --initCreates ~/.gemini-image-mcp.json with commented defaults. For project-specific overrides:
npx @jimothy-snicket/gemini-image-mcp --init --localRelated MCP server: Nano-Banana MCP Server
Features
generate_image — AI-powered
Text-to-image — describe what you want, get an image
Image editing — provide reference images and an editing instruction
Transparent assets in one call —
removeBackgroundreturns a clean transparent PNG: a local AI matte (works on any subject; optional add-on, see below) by default, or built-in green-screen / white-threshold keying. No extra API costMulti-turn edits — pass a
sessionIdto refine an image across calls, with prior turns kept as contextMulti-image input — up to ~14 reference images on gemini-3.1-flash-image (~11 on gemini-3-pro-image)
Cost reporting — every response includes token counts, estimated USD cost, and session totals
Rate limiting — configurable per-hour caps on requests and cost to prevent runaway agents
Auto model discovery — detects available image models from your API key at startup
Seed — reproducible generation with integer seeds
Google Search grounding — real-world accuracy on the gemini-3.x image models
process_image — Local (free, no API calls)
Crop — pixel-exact, aspect ratio (center), or focal point (attention/entropy)
Resize — to width, height, or both (maintains aspect ratio)
Background removal — threshold-based (white backgrounds) or chroma key (green screen, any solid colour)
Chroma key pipeline — HSV keying with smoothstep feather, spill suppression, and edge anti-aliasing
Trim — auto-remove whitespace borders
Format conversion — PNG, JPEG, WebP with quality control
Both tools
Output organization — meaningful filenames with auto-versioning, subfolders
Generation manifest —
generations.jsonllogs every generation with prompt, params, costFull aspect ratio support — 1:1, 16:9, 9:16, 3:2, 2:3, 4:3, 3:4, 21:9
Resolution control — 1K, 2K, 4K
Setup
1. Get a Gemini API Key
Go to Google AI Studio and create an API key. It's free to start with generous rate limits.
2. Set the API Key
The server reads your key from the GEMINI_API_KEY environment variable. Set it once so it's available in every session:
Windows (PowerShell — run as admin):
[System.Environment]::SetEnvironmentVariable('GEMINI_API_KEY', 'your-key-here', 'User')Then restart your terminal.
macOS / Linux:
echo 'export GEMINI_API_KEY="your-key-here"' >> ~/.bashrc
source ~/.bashrc(Use ~/.zshrc if you're on zsh.)
Verify it's set:
echo $GEMINI_API_KEY3. Connect to Your MCP Client
Pick the method that matches how you use MCP:
Claude Code (one-liner)
claude mcp add gemini-image -- npx -y @jimothy-snicket/gemini-image-mcpClaude Code will pick up GEMINI_API_KEY from your environment automatically.
Claude Code (manual .mcp.json)
Add to .mcp.json in your project root or ~/.claude/.mcp.json for global access:
{
"mcpServers": {
"gemini-image": {
"command": "npx",
"args": ["-y", "@jimothy-snicket/gemini-image-mcp"],
"env": {
"GEMINI_API_KEY": "${GEMINI_API_KEY}"
}
}
}
}The ${GEMINI_API_KEY} syntax reads the value from your shell environment — your actual key never gets written into config files.
Claude Desktop
Edit claude_desktop_config.json:
macOS:
~/Library/Application Support/Claude/claude_desktop_config.jsonWindows:
%APPDATA%\Claude\claude_desktop_config.json
{
"mcpServers": {
"gemini-image": {
"command": "npx",
"args": ["-y", "@jimothy-snicket/gemini-image-mcp"],
"env": {
"GEMINI_API_KEY": "${GEMINI_API_KEY}"
}
}
}
}Restart Claude Desktop after saving.
Other MCP Clients
Any client that supports stdio transport works. Point it at npx -y @jimothy-snicket/gemini-image-mcp and pass GEMINI_API_KEY in the environment.
Security Notes
Never commit your API key to version control. The
${GEMINI_API_KEY}syntax in config files references your environment — the key itself stays in your shell profile.If your
.mcp.jsonis in a project repo, add it to.gitignoreor use the global config at~/.claude/.mcp.jsoninstead.For extra security, you can use a wrapper script that reads the key from your OS keychain (macOS Keychain, Windows Credential Manager) and launches the server with it injected.
Configuration
All optional. The only required setup is GEMINI_API_KEY (covered above).
Variable | Default | Description |
|
| Default directory for saved images |
|
| Default Gemini model |
|
|
|
|
| API request timeout in milliseconds |
|
| Max image generations per rolling hour |
|
| Max estimated cost (USD) per rolling hour |
|
| Multi-turn session expiry |
|
| Auto-install the AI matte engine on first |
Set these the same way as GEMINI_API_KEY, or pass them in the env block of your MCP config.
Rate limiting is recommended when agents have access to this tool. An agent in a loop can generate images quickly — set MAX_REQUESTS_PER_HOUR=20 and MAX_COST_PER_HOUR=5 as sensible defaults.
Config File
Instead of environment variables, you can use a JSON config file. Create one with:
npx @jimothy-snicket/gemini-image-mcp --initThis creates ~/.gemini-image-mcp.json with all defaults and inline documentation. Edit it to set your preferences.
Priority: env vars > local config (.gemini-image-mcp.json in CWD) > global config (~/.gemini-image-mcp.json) > defaults.
You can also set per-tool defaults so every request uses your preferred settings:
{
"defaultModel": "gemini-3.1-flash-image",
"defaults": {
"generate": {
"aspectRatio": "16:9",
"resolution": "2K"
},
"process": {
"removeBackground": { "color": "#00FF00" },
"trim": true
}
}
}Per-request parameters always override config defaults.
Custom pricing. Cost estimates come from a built-in per-token rate table (there's no pricing API to fetch live). If you use a model the table doesn't know yet — or Google changes a rate before this package updates — add pricingOverrides so cost reporting stays accurate without waiting for a release:
{
"pricingOverrides": {
"some-new-image-model": {
"inputPerMillion": 0.5,
"textOutputPerMillion": 60,
"imageOutputPerMillion": 60,
"thinkingPerMillion": 60
}
}
}Models with no entry (built-in or override) still generate — their cost is reported as unknown rather than guessed.
Tool: generate_image
Parameters
Parameter | Required | Description |
| Yes | Text description or editing instruction |
| No | Array of file paths to input/reference images |
| No | Gemini model ID |
| No |
|
| No |
|
| No | Override output directory for this request |
| No | Base name for saved file (e.g. |
| No | Subfolder within output directory (e.g. |
| No | Continue a multi-turn editing session from a previous response |
| No | Integer seed for reproducible generation |
| No | Enable Google Search grounding (gemini-3.x image models) |
| No | Return a transparent PNG cutout. |
Example Response
{
"imagePath": "/home/user/gemini-images/hero-banner.png",
"mimeType": "image/png",
"model": "gemini-2.5-flash-image",
"sessionId": "session-1711929600000-a1b2c3",
"sessionTurn": 1,
"usage": {
"promptTokens": 5,
"outputTokens": 1295,
"imageTokens": 1290,
"thinkingTokens": 412,
"totalTokens": 1712,
"estimatedCost": "$0.0390",
"pricingVerifiedDate": "2026-06-15"
},
"session": {
"generationsThisSession": 3,
"totalCostThisSession": "$0.1161",
"generationsThisHour": 5,
"limit": {
"maxPerHour": 20,
"maxCostPerHour": 5,
"remainingThisHour": 15
}
}
}Usage Examples
Text-to-image:
"Generate a hero image for a SaaS landing page, modern gradient style, 16:9"
Image editing:
"Take this screenshot and redesign the header with a dark theme" (with image paths)
Iterative editing (multi-turn):
Generate an image, then call again with the returned
sessionIdand a refinement like "make it more minimal" — the prior image stays in context.
Organized output:
"Generate a hero banner" with
filename: "hero",subfolder: "landing-page"→ saves to~/gemini-images/landing-page/hero.png
High quality:
"A photorealistic product shot of headphones on marble, 4K" (using gemini-3-pro-image)
Transparent asset (one call):
"A glossy red sneaker, product shot" with
removeBackground: { "mode": "auto" }→ a ready-to-place transparent PNG. The local AI matte works on any subject — no green screen needed.
Tool: process_image
Local image processing via sharp. Free, fast, no API calls.
Parameters
Parameter | Required | Description |
| Yes | Path to the image file to process |
| No | Crop by pixel dimensions, aspect ratio, or focal point strategy |
| No | Resize to width/height (maintains aspect ratio) |
| No | Remove background: |
| No | Auto-remove whitespace/transparent borders |
| No | Convert to |
| No | Output quality for JPEG/WebP (1-100) |
| No | Base name for saved file. Auto-versioned if duplicate. |
| No | Subfolder within output directory |
| No | Override output directory |
Crop Options
// Pixel-exact
{"width": 500, "height": 300, "left": 100, "top": 50}
// Aspect ratio (center crop)
{"aspectRatio": "16:9"}
// Focal point — shifts crop to the most interesting region
{"aspectRatio": "16:9", "strategy": "attention"}
// Detail-based — shifts crop to the most detailed region
{"aspectRatio": "16:9", "strategy": "entropy"}Background Removal Options
// AI semantic matte — best quality, works on ANY subject
{"mode": "auto"}
// White/light background (threshold)
{"mode": "threshold", "threshold": 240}
// Green screen (chroma key)
{"mode": "chroma", "color": "#00FF00"}
// Any solid colour
{"mode": "chroma", "color": "#0000FF", "tolerance": 60}mode: "auto" runs a local BiRefNet matte that isolates the subject semantically — so it handles hair, glass, and green/yellow subjects that chroma key can't. The matte engine isn't bundled (keeps the base install ~65 MB). On your first auto call the server auto-installs it (@huggingface/transformers, ~340 MB) plus the fp16 model (~109 MB) — a one-time pause of a minute or two, then it runs locally with no extra API cost. Set GEMINI_IMAGE_AUTO_INSTALL=0 to disable auto-install (then auto falls back to returning the image with instructions to install it manually). chroma and threshold need nothing extra.
Chroma key (mode: "chroma") uses HSV keying with smoothstep feathering, spill suppression, and 5-pass edge anti-aliasing (default tolerance 80). Use #00FF00 for AI-generated green screens — it works better than matching the exact shade Gemini produces.
Note: Chroma key destroys subjects that share the key colour (green/yellow) and transparent/reflective subjects (glass) — the green parrot vanishes. For those, use mode: "auto" (the AI matte preserves them), or the canvas approach: feed a solid-colour background image to generate_image and let Gemini place the subject with correct lighting. The canvas approach is still best for truly transparent objects like glass, which should transmit the final background rather than be cut out.
Common Pipelines
Subject on a specific background (canvas approach):
generate_image → "Place a [subject] on this background" with images: [solid colour canvas]One API call. Best for yellow, green, or glass subjects where chroma key struggles.
Transparent asset (one call):
generate_image → "A product photo of <subject>" with removeBackground: {mode: "auto"}One API call → a transparent PNG. The local AI matte works on any subject. (For truly transparent/reflective objects like glass, the canvas approach above is still best.)
Transparent asset from green screen (zero-dependency):
generate_image → "A product photo on a bright green background"
process_image → removeBackground {mode: "chroma"} + trimAvoids the matte model entirely — best for high-contrast subjects on locked-down/offline machines.
Favicon from a generated logo:
process_image → removeBackground {threshold: 230} + trim + resize {width: 192, height: 192}Social card from a photo:
process_image → crop {aspectRatio: "16:9", strategy: "attention"} + resize {width: 1200}WebP conversion for web:
process_image → format: "webp" + quality: 85Models
Model | Strengths | Resolution | Notes |
| Cheapest (~$0.04/image) | 1K | Default. Shuts down 2026-10-02 |
| Speed + quality, Google Search grounding | 512, 1K, 2K, 4K | ~$0.07/1K image. ~14 reference images |
| Best quality, text rendering | 1K, 2K, 4K | ~$0.13/1K image. ~11 reference images |
The -preview IDs (gemini-3-pro-image-preview, gemini-3.1-flash-image-preview) are still accepted during Google's cutover but retire 2026-06-25 — use the GA IDs above. The server discovers whichever image models your API key supports at startup and validates each request against that live list, so new models work without an update.
Development
bun install
bun run build # TypeScript -> dist/
bun run dev # Run directly with BunLicense
MIT
Maintenance
Resources
Unclaimed servers have limited discoverability.
Looking for Admin?
If you are the server author, to access and configure the admin panel.
Tools
Latest Blog Posts
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/JimothySnicket/gemini-image-mcp'
If you have feedback or need assistance with the MCP directory API, please join our Discord server