image
Create or modify images by describing them in text, optionally using reference images. Automatically selects the best OpenAI model and saves the output to a file path.
Instructions
Generate, edit, or compose images via OpenAI's gpt-image family.
BEFORE the FIRST call in a conversation, read the MCP resource
image-guide://full for the full prompting guide (structure,
realism rules, edit/compose modes, when to set quality/fidelity).
You only need to read it once per conversation.
Mode is selected by references_paths:
omitted/empty -> generate from text alone
1 path -> edit that image
2..16 paths -> generate using them as labeled references
Model routing is automatic and reported in the response:
background='transparent' -> gpt-image-1.5 (gpt-image-2 rejects alpha)
everything else -> gpt-image-2 (flagship)
Returns metadata only — the file is written to output_path.
Read the file with the Read tool only if you need to verify the result.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| prompt | Yes | Structured English description of the desired image. When references_paths has more than one entry, label each one explicitly in the prompt (e.g. "Image 1: subject. Image 2: style reference."). See image-guide://full for the full prompting guide. | |
| output_path | Yes | ABSOLUTE filesystem path where the result will be saved. Extension determines format: .png / .jpg / .jpeg / .webp. Parent directory MUST already exist (create it via Bash mkdir -p before retrying). File MUST NOT already exist. | |
| size | Yes | Output resolution. REQUIRED — pick deliberately based on the use case: - 1024x1024 — generic single subject, avatar, icon - 1536x1024 / 1024x1536 — landscape / portrait composition - 2048x2048 — high-res square (hero blocks, album art) - 2048x1152 / 1152x2048 — 16:9 / 9:16 banners, video thumbs - 3840x2160 / 2160x3840 — 4K, only when text/UI must be crisp | |
| references_paths | No | Optional. Up to 16 ABSOLUTE paths to existing PNG/JPG/WebP files (each ≤50 MB) used as input images. Omit for pure text-to-image generation. | |
| quality | No | Optional. OMIT in most cases — the default ('auto') already produces excellent quality. Pass 'low' for cheap drafts. Pass 'high' only when text legibility (UI mockups), photorealism, or final-output quality is critical. | |
| input_fidelity | No | Optional. Only relevant when references_paths is set. Pass 'high' when faces/identity must be preserved exactly (portrait edits, virtual try-on, product placement). Otherwise omit — defaults to 'low' on the OpenAI side, which is cheaper and faster. | |
| background | No | Background handling. Use 'transparent' for logos, icons, isolated products, or anything you'll composite later (only valid with .png/.webp). 'opaque' forces a solid background. 'auto' lets the model decide. | auto |
Output Schema
| Name | Required | Description | Default |
|---|---|---|---|
No arguments | |||