Skip to main content
Glama

Generate image (Grok Imagine, GPT Image 2, Seedream V4, Wan, Imagen 4, Nano Banana, Ideogram V3, Z-Image Turbo)

aetherwave_generate_image

Generate images from text prompts or combine with reference images for style transfer and editing. The tool submits a job, polls until done, and returns the resulting image URLs.

Instructions

Generates one or more images from a text prompt (T2I) or a text prompt + reference image(s) (I2I). Submits the job, polls until terminal, and returns the final image URLs. Default model is 'grok-imagine-t2i' (fast, 6 images per generation, 5 credits). Use list_image_models to see the full lineup with pricing. For I2I, pass referenceImages as an array of public image URLs and pick a model with I2I support (e.g. 'grok-imagine-i2i', 'wan-2.5-spicy-i2i').

Model selection guide (when the user does not specify a model)

Default: grok-imagine-t2i (5 cr, 6 outputs per call, fast, general purpose).

Strong recommendation: when a single high-quality output is what's wanted (most agent / one-shot workflows), prefer gpt-image-2-t2i (9 cr @ 1K / higher @ 2K, single deterministic image, best general quality across realism, illustration, typography, and composition; supports up to 2K resolution and most aspect ratios including auto). This is the front-runner for serious creative output where you don't need to pick from 6 variations.

Pick a different model when the prompt has these signals:

  • "single best result" / "one image" / production / no time to pick from variations -> gpt-image-2-t2i (9 cr, 1 output, top general quality)

  • "photoreal" / "photo of" / "realistic" -> gpt-image-2-t2i (9 cr, best general realism) or imagen-4 (12 cr, very high quality) or z-image-turbo (3 cr, fastest)

  • "highest quality" / "premium" / no budget -> gpt-image-2-t2i at 2K, or grok-imagine-quality-t2i (16 cr @ 1K, 22 cr @ 2K), or imagen-4-ultra

  • Text inside the image (signs, posters, typography) -> ideogram-v3-t2i (best in class) or gpt-image-2-t2i (also strong)

  • Artistic / painterly / stylized -> midjourney-t2i

  • Album art / cover art -> gpt-image-2-t2i for one strong image; grok-imagine-t2i for 6 variations to choose from; seedream-v4-t2i if 4K wanted

  • Logo or design with embedded text -> ideogram-v3-t2i

  • NSFW / adult / explicit -> wan-2.5-spicy-t2i (auto-tags creation as 18+; routes to adult gallery)

  • Cheapest possible / quick test -> z-image-turbo (3 cr)

  • Multiple variations to compare -> keep grok-imagine-t2i (6 outputs default) or use numImages on a multi-output model

For I2I (reference image provided): prefer the dedicated aetherwave_edit_image tool for "change something in this image" intent. Use aetherwave_generate_image with I2I models only when you specifically want style transfer (midjourney-i2i), premium quality (grok-imagine-quality-i2i), or adult content (wan-2.5-spicy-i2i).

Always pass an explicit aspectRatio (e.g. "1:1" for square album art, "16:9" for video thumbnails, "9:16" for shorts/reels). Some upstream providers reject submissions with no aspect ratio.

Ask the user only when:

  • The prompt contradicts itself (e.g., "highest quality but cheapest")

  • The user requested "the best model" with no context, surface 2-3 options with tradeoffs

  • A single generation would cost more than 20 credits and the user has not confirmed

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
promptYesText description of the image to generate.
modelNoModel ID. Defaults to 'grok-imagine-t2i'. Use list_image_models for the full list.
aspectRatioNoAspect ratio (e.g. '1:1', '16:9', '9:16'). Pass this explicitly when possible; some upstream providers reject submissions without an aspect ratio. Default ratios vary by model.
resolutionNoOutput resolution. Most models accept '1K' or '2K'; some accept '480p'/'720p'.
referenceImagesNoArray of public image URLs for image-to-image generation. Required when using an I2I model. A single URL string is also accepted (wrapped as a one-element array).
numImagesNoNumber of images for models that support multiple outputs.
negative_promptNoWhat to avoid in the output (supported by some models).
seedNoSeed for deterministic generation (supported by some models).
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Describes the job submission and polling process, credit costs, and the importance of aspectRatio. Annotations add minimal behavior info (readOnlyHint false, destructiveHint false), so the description carries the burden. It does not explicitly mention timeout behavior or error handling upon failure, which is a minor gap.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Well-structured with a main paragraph followed by a bullet-point model selection guide. It is lengthy but appropriate given the tool's complexity (8 parameters, multiple models). Could be slightly more concise, but the organization makes it easy to scan.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Covers usage, model selection, parameter details, and when to ask the user. No output schema exists, so the description should explain the return format, which it does succinctly ('returns the final image URLs'). Missing details on error handling or pagination, but sufficient for the tool's complexity.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% (all 8 parameters documented). The description adds substantial value beyond the schema: detailed model selection guide, referenceImages usage, aspectRatio criticality, and numImages constraints. For example, the model parameter schema only says 'Defaults to...', while the description provides a comprehensive decision tree.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it generates images from text or text+reference images, specifies the default model and ability to get multiple outputs, and distinguishes from sibling tools like aetherwave_edit_image for I2I modifications. The verb 'Generates' is specific and the resource 'image' is clearly defined.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides an extensive model selection guide with explicit when-to-use recommendations based on user intent (e.g., single high-quality output, photoreal, NSFW, cheapest). Also advises when to use aetherwave_edit_image for I2I changes and when to ask the user (contradictory prompts, expensive generations). Covers exclusion criteria and alternatives thoroughly.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/AetherWave-Studio/aetherwave-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server