Generate video (Grok Imagine, Wan 2.7, Hailuo 02, Seedance, Kling 2.6, VEO 3.1, Happy Horse)
aetherwave_generate_videoGenerates short-form videos from text prompts or text with a starting image. Submits, polls, and returns the final video URL.
Instructions
Generates a short-form video from a text prompt (T2V) or a text prompt + starting image (I2V). Submits, polls, and returns the final video URL. Default model is 'grok-imagine-t2v' (fast, 4-6 cr/s, with built-in KIE -> fal.ai fallback). Use list_video_models for the full lineup with credit cost per second. I2V models (e.g. 'grok-imagine-i2v', 'seedance-pro-i2v') require a public imageUrl. Video generation can take 30s to several minutes; this tool polls with up to an 8-minute budget.
Model selection guide for videos (when the user does not specify a model)
Default: grok-imagine-t2v (4-6 cr/s, fast, has KIE -> fal.ai fallback for redundancy. Best general-purpose).
Pick a different model when the prompt has these signals:
"highest quality" / "premium" / broadcast / commercial ->
veo3.1-qualityorveo3-quality(Google's flagship, fixed 350-560 cr for 8s, 3-5 min)"fast premium" / quick high-quality ->
veo3-fastorveo3.1-fast(84 cr fixed for 8s)Cinematic camera moves / dolly / pan ->
seedance-pro-t2v(3-10 cr/s) orkling-3.0-pro-t2v(26 cr/s)Realistic human motion / faces ->
hailuo-2.3-pro-i2v(I2V, supply imageUrl)Talking head / lip sync ->
kling-avatar-pro(23 cr/s) orinfinitalk(5-17 cr/s)Anime / stylized / fantasy ->
wan-2.7-t2vNSFW / adult ->
wan-22-nsfw-i2v(I2V only; auto-tags adult)Animate this exact image -> any I2V variant (
grok-imagine-i2v,seedance-pro-i2v,hailuo-2.3-pro-i2v)First + last frame interpolation ->
seedance-pro-i2vwith bothimageUrl+endImageUrlCheapest test ->
hailuo-2.0-standard@ 512p (3 cr/s, ~18 cr for 6s) orgrok-imagine-t2v@ 480p (4 cr/s, ~24 cr for 6s)Clip 12-15s ->
grok-imagine-t2v(accepts up to 15s)True 4K ->
kling-3.0-4k-t2v(94 cr/s, expensive but native 4K)
Audio in generated video: grok-imagine-t2v, seedance-pro-t2v, and the VEO 3.x family include audio at base cost (no surcharge). Kling 2.6 and Kling 3.0 are the outliers — they price audio as a +50-100% surcharge (Kling 2.6 doubles the cost, Kling 3.0 Pro adds ~46%). Default to Grok / Seedance / VEO when sound matters and you don't want to think about audio pricing.
Cost framing: resolution and duration drive cost more than model choice. A 6-second 480p Grok generation costs ~24 cr; the same prompt at 1080p Seedance 2 is ~858 cr (35x more). Pick the lowest acceptable resolution + duration first.
For I2V models: imageUrl is required. For first+last-frame models, pass endImageUrl too.
Ask the user only when:
Single generation would cost more than 100 credits and they haven't confirmed
They asked for "the best" with no other signal; surface 2-3 options with cost ranges
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| prompt | Yes | Text description of the video scene. | |
| model | No | Model ID. Defaults to 'grok-imagine-t2v'. Use list_video_models for the full list. | |
| duration | No | Duration in seconds. Grok Imagine accepts 6-15; other models have their own ranges (see list_video_models). | |
| resolution | No | Output resolution. Default depends on model. | |
| aspectRatio | No | Aspect ratio (e.g. '16:9', '9:16', '1:1'). | |
| imageUrl | No | Public URL of starting image. Required for I2V models. | |
| endImageUrl | No | Public URL of ending image. Supported by some I2V models (first+last frame). | |
| mode | No | Moderation mode for Grok Imagine. Defaults to 'normal'. |