generate_image
Create custom images using AI with options for transparency, multiple aspect ratios, and style reference images to generate visual content from text prompts.
Instructions
Generate image assets using Gemini AI with optional transparency and reference images.
[Model Guidance]
Flash3.1 (recommended): High quality, very fast, supports grounding and advanced features.
Pro3: Higher fidelity, but more costly and slower.
Flash2.5: Legacy, maintained for compatibility. Does not support 0.5K, 2K, or 4K resolutions.
[Aspect Ratios & Pixel Sizes] Gemini supports the following aspect ratios (model-dependent):
Common to all models: 1:1 (e.g. 512x512, 1024x1024), 2:3 (424x632, 848x1264), 3:2 (632x424, 1264x848), 3:4 (448x600, 896x1200), 4:3 (600x448, 1200x896), 4:5 (410x512, 820x1024), 5:4 (512x410, 1024x820), 9:16 (360x640, 720x1280), 16:9 (688x384, 1376x768), 21:9 (896x384, 1792x768)
Flash3.1 only: 1:4 (128x512, 256x1024), 4:1 (512x128, 1024x256), 1:8 (64x512, 128x1024), 8:1 (512x64, 1024x128) (0.5K/1K: see above, 2K/4K: double these sizes)
To avoid cropping or padding, set width and height to match a supported aspect ratio. If the requested size does not match, the image will be center-cropped or padded after generation. If you intentionally want to control the resizing/cropping behavior, use the 'resizeMode' parameter: 'crop' (default, center crop), 'letterbox' (fit with padding), 'contain' (trim transparent margins then fit), or 'stretch' (distort to fit).
[IMPORTANT] Always preserve the user's prompt as-is, including language and nuance. Do not translate or summarize.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| prompt | Yes | User-provided image prompt. Preserve the original wording and detail; do not summarize or translate. Only append transparency-related hints if needed. | |
| outputFileName | Yes | Output filename (extension auto-added if missing) | |
| outputType | No | Output format: file=file only, base64=base64 only, combine=both | combine |
| model | No | Model tier to use for generation (see tool description for details; "flash" and "pro" are aliases for Flash2.5 and Pro3) | Flash3.1 |
| output_resolution | No | Gemini generation source resolution (optional; normally auto-calculated from pixel size. Set only to override. Final image is resized to requested pixel size.) | |
| outputWidth | Yes | Output image width in pixels. The image will be generated using the closest supported Gemini aspect ratio and resolution, then resized to this width. To avoid cropping or padding, set width and height to match a supported aspect ratio (see tool description). | |
| outputHeight | Yes | Output image height in pixels. The image will be generated using the closest supported Gemini aspect ratio and resolution, then resized to this height. To avoid cropping or padding, set width and height to match a supported aspect ratio (see tool description). | |
| output_format | No | Output format | png |
| outputPath | No | Output directory path (MUST be an absolute path when outputType is file or combine) | |
| transparent | No | Request transparent background (PNG or WebP only). Background color is selected by histogram analysis. | |
| transparentColor | No | Color to make transparent. Hex (e.g. #FF00FF). null defaults to #FF00FF when transparent=true. | |
| colorTolerance | No | Tolerance for color matching (0-255). Higher values are more permissive for transparent color selection and keying. | |
| fringeMode | No | Fringe reduction mode: auto (size-based), crisp (binary alpha), hd (force-clear 1px boundary for large images). | auto |
| resizeMode | No | Resize mode: crop=center crop, stretch=distort, letterbox=fit with padding, contain=trim transparent margins then fit | crop |
| grounding_type | No | Grounding tool usage (3.1 only) | none |
| thinking_mode | No | Thinking mode (3.1 only) | minimal |
| include_thoughts | No | Optional (default: false). Request thought fields from Gemini (3.1 only). Thought content is returned in MCP response only when include_metadata=true. | |
| include_metadata | No | Include grounding and reasoning metadata in JSON output (optional, may increase payload size). | |
| referenceImages | No | Reference images for style guidance (Flash2.5: max 3, others: max 14) | |
| debug | No | Debug mode: output intermediate processing images and prompt |