Generate or edit images (Multi-Model: Flash & Pro)
generate_imageGenerate or edit images using natural language descriptions. Supports creation from scratch, multi-image conditioning, and editing via file IDs or file paths. Controls aspect ratio, resolution, and model tier for optimized results.
Instructions
Generate new images or edit existing images using natural language instructions.
Supports multiple input modes:
Pure generation: Just provide a prompt to create new images
Multi-image conditioning: Provide up to 3 input images using input_image_path_1/2/3 parameters
File ID editing: Edit previously uploaded images using Files API ID
File path editing: Edit local images by providing single input image path
Automatically detects mode based on parameters or can be explicitly controlled. Input images are read from the local filesystem to avoid massive token usage. Returns both MCP image content blocks and structured JSON with metadata.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| prompt | Yes | Clear, detailed image prompt. Include subject, composition, action, location, style, and any text to render. Use the aspect_ratio parameter to pin a specific canvas shape when needed. | |
| n | No | Requested image count (model may return fewer). | |
| negative_prompt | No | Things to avoid (style, objects, text). | |
| system_instruction | No | Optional system tone/style guidance. | |
| input_image_path_1 | No | Path to first input image for composition/conditioning | |
| input_image_path_2 | No | Path to second input image for composition/conditioning | |
| input_image_path_3 | No | Path to third input image for composition/conditioning | |
| file_id | No | Files API file ID to use as input/edit source (e.g., 'files/abc123'). If provided, this takes precedence over input_image_path_* parameters for the primary input. | |
| mode | No | Operation mode: 'generate' for new image creation, 'edit' for modifying existing images. Auto-detected based on input parameters if not specified. | auto |
| model_tier | No | Model tier: 'flash' (legacy, 1024px), 'nb2' (4K at Flash speed, default), 'pro' (max quality, 4K), or 'auto' (smart selection). Default: 'auto' - automatically selects nb2 or pro based on prompt. | auto |
| resolution | No | Output resolution: 'high', '4k', '2k', '1k'. 4K and 2K available with 'nb2' and 'pro' models. Default: 'high'. | high |
| thinking_level | No | Reasoning depth hint: 'low' (faster), 'high' (better quality). Applied to the 'nb2' model; 'high' also biases auto-selection toward Pro. Default: None (auto). | |
| enable_grounding | No | Enable Google Search grounding for factual accuracy (NB2 and Pro models). Useful for real-world subjects. Default: true. | |
| aspect_ratio | No | Optional output aspect ratio (e.g., '16:9'). Standard: 1:1, 2:3, 3:2, 3:4, 4:3, 4:5, 5:4, 9:16, 16:9, 21:9. Extreme (nb2 only): 4:1, 1:4, 8:1, 1:8. | |
| output_path | No | Output path for generated image(s). If a file path with extension (e.g., '/path/image.png'), saves directly to that path. If a directory path (e.g., '/path/to/dir/'), uses default filename in that directory. If None, uses IMAGE_OUTPUT_DIR environment variable or ~/nanobanana-images. | |
| return_full_image | No | Return full-resolution images in MCP response instead of thumbnails. Warning: full images can be large (3-7MB each for 4K). Default: uses RETURN_FULL_IMAGE env var, or false if not set. |