Generate Image
generate_imageGenerate images from text prompts, edit images with reference files, and remove backgrounds for transparent PNGs using Google Gemini.
Instructions
Generate or edit images using Google Gemini. Provide just a prompt for text-to-image generation. Add image file paths to edit or use reference images. Set removeBackground to get a transparent PNG cutout in one call (local AI matte; works on any subject, no extra API cost). Returns the saved file path, model used, token counts, and estimated cost.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| prompt | Yes | Text description of the image to generate, or editing instruction when images are provided | |
| images | No | File paths to input/reference images for editing. Omit for text-to-image generation. Max references vary by model (gemini-3.1-flash-image ~14, gemini-3-pro-image ~11). | |
| model | No | Gemini image model ID. Defaults to the configured default (gemini-2.5-flash-image). Validated at request time against the models your API key supports (discovered at startup). Common: gemini-3.1-flash-image (fast, grounding, 512-4K), gemini-3-pro-image (best quality, up to 4K), gemini-2.5-flash-image (cheapest, 1K; shuts down 2026-10-02). | |
| aspectRatio | No | Image aspect ratio (defers to the API — unsupported values are rejected by Gemini). Defaults to config value or 1:1. Current models support: 1:1, 2:3, 3:2, 3:4, 4:3, 4:5, 5:4, 9:16, 16:9, 21:9, plus 1:4, 4:1, 1:8, 8:1 on gemini-3.1-flash-image. | |
| resolution | No | Image resolution. Defaults to config value or 1K. 512 only on gemini-3.1-flash-image; 1K/2K/4K on gemini-3.x image models; gemini-2.5-flash-image is 1K. | |
| outputDir | No | Directory to save the image. Defaults to config file outputDir, OUTPUT_DIR env var, or ~/gemini-images | |
| filename | No | Base name for the saved file (e.g. 'hero-banner'). Extension added automatically. Duplicates get a version suffix (hero-banner-v2). Omit for auto-generated name. | |
| subfolder | No | Subfolder within the output directory (e.g. 'landing-page'). Created automatically. | |
| sessionId | No | Continue a multi-turn edit. Pass the sessionId from a previous response to refine that image across calls — the server keeps the prior turns as context. | |
| seed | No | Seed for reproducible generation. Same seed + prompt + model = same image. | |
| useSearchGrounding | No | Enable Google Search grounding for real-world accuracy. Supported on the gemini-3.x image models; the API rejects it on models that don't support it. | |
| removeBackground | No | Return a transparent PNG cutout in one call. Omit for a normal opaque image. Default mode 'auto' runs a local AI matte (no extra API cost; first use downloads a ~one-time model). Supplying `color` implies chroma and `threshold` implies threshold — these override the 'auto' default. |