Skip to main content
Glama

Generate Image

generate_image

Generate images from text prompts, edit images with reference files, and remove backgrounds for transparent PNGs using Google Gemini.

Instructions

Generate or edit images using Google Gemini. Provide just a prompt for text-to-image generation. Add image file paths to edit or use reference images. Set removeBackground to get a transparent PNG cutout in one call (local AI matte; works on any subject, no extra API cost). Returns the saved file path, model used, token counts, and estimated cost.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
promptYesText description of the image to generate, or editing instruction when images are provided
imagesNoFile paths to input/reference images for editing. Omit for text-to-image generation. Max references vary by model (gemini-3.1-flash-image ~14, gemini-3-pro-image ~11).
modelNoGemini image model ID. Defaults to the configured default (gemini-2.5-flash-image). Validated at request time against the models your API key supports (discovered at startup). Common: gemini-3.1-flash-image (fast, grounding, 512-4K), gemini-3-pro-image (best quality, up to 4K), gemini-2.5-flash-image (cheapest, 1K; shuts down 2026-10-02).
aspectRatioNoImage aspect ratio (defers to the API — unsupported values are rejected by Gemini). Defaults to config value or 1:1. Current models support: 1:1, 2:3, 3:2, 3:4, 4:3, 4:5, 5:4, 9:16, 16:9, 21:9, plus 1:4, 4:1, 1:8, 8:1 on gemini-3.1-flash-image.
resolutionNoImage resolution. Defaults to config value or 1K. 512 only on gemini-3.1-flash-image; 1K/2K/4K on gemini-3.x image models; gemini-2.5-flash-image is 1K.
outputDirNoDirectory to save the image. Defaults to config file outputDir, OUTPUT_DIR env var, or ~/gemini-images
filenameNoBase name for the saved file (e.g. 'hero-banner'). Extension added automatically. Duplicates get a version suffix (hero-banner-v2). Omit for auto-generated name.
subfolderNoSubfolder within the output directory (e.g. 'landing-page'). Created automatically.
sessionIdNoContinue a multi-turn edit. Pass the sessionId from a previous response to refine that image across calls — the server keeps the prior turns as context.
seedNoSeed for reproducible generation. Same seed + prompt + model = same image.
useSearchGroundingNoEnable Google Search grounding for real-world accuracy. Supported on the gemini-3.x image models; the API rejects it on models that don't support it.
removeBackgroundNoReturn a transparent PNG cutout in one call. Omit for a normal opaque image. Default mode 'auto' runs a local AI matte (no extra API cost; first use downloads a ~one-time model). Supplying `color` implies chroma and `threshold` implies threshold — these override the 'auto' default.
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations provided, so description carries full burden. It discloses return values (saved file path, model, tokens, cost), removeBackground behavior (first-use model download, fallbacks, mode effects), and runtime validation. Missing auth/rate limits, but still transparent for a complex tool.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Description is relatively long but well-structured: starts with core purpose, then specific capabilities (removeBackground), then parameter details. Each sentence adds value. Slightly verbose in removeBackground section, but overall efficient.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given 12 parameters, nested object, no output schema, and 100% schema coverage, the description is thorough. It explains return values and parameter interactions (e.g., seed reproducibility, session continuation, model support). Could add error handling details, but still highly complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, baseline 3. Description adds significant value: explains sessionId for multi-turn edits, removeBackground sub-parameters (auto, chroma, threshold) with defaults and dependencies, and default resolution/aspect-ratio derivation. This goes well beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states tool generates or edits images using Google Gemini, listing two main modes: text-to-image and editing with reference images. The sibling tool process_image is not differentiated explicitly, but the description's focus on generation/editing implies separation.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides clear guidance: 'Provide just a prompt for text-to-image generation. Add image file paths to edit or use reference images.' Also gives detailed usage instructions for removeBackground modes (auto, chroma, threshold) with pros and cons. Lacks explicit exclusions or when-not-to-use, but context is sufficient.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/JimothySnicket/gemini-image-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server