Gemini Multi-modal Image Generation
gemini-imagineGenerate images by combining text prompts with existing images as inputs. This tool processes multi-modal content to create visual outputs.
Instructions
Generate images using multi-modal inputs (text + images)
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| contents | Yes | Array of content items (text and/or images) | |
| model | No | Model to use (default: gemini-2.5-flash-image-preview) | |
| ref | No | Optional reference ID | |
| webhookOverride | No | Optional webhook URL |
Output Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| error | No | ||
| status | No | ||
| success | Yes | ||
| imageUrl | No | ||
| progress | No | ||
| messageId | No |