Skip to main content
Glama

Server Configuration

Describes the environment variables required to run the server.

NameRequiredDescriptionDefault
AETHERWAVE_API_KEYYesYour API key. Get one at /profile -> Developer tab. Must start with aw_live_.
AETHERWAVE_BASE_URLNoOverride the API base URL (useful for staging or self-hosted).https://aetherwavestudio.com

Capabilities

Features and capabilities supported by this server

CapabilityDetails
tools
{
  "listChanged": true
}

Tools

Functions exposed to the LLM to take actions

NameDescription
aetherwave_balanceA

Returns the current AetherWave credit balance for the API key. Use this BEFORE a generation to confirm sufficient credits, especially for video which can cost 30-300+ credits depending on model/duration/resolution.

aetherwave_list_image_modelsA

Returns every image-generation model AetherWave supports, with its credit cost, default aspect ratio, supported inputs (T2I vs I2I), and any model-specific options. Call this before generate_image when you don't know the right model ID. The model key (e.g. 'grok-imagine-t2i') is what you pass as model to generate_image.

aetherwave_list_master_presetsA

Returns every AI mastering preset AetherWave supports, with target LUFS, tags, descriptions, and difficulty level. Call this before master_audio when you don't know which preset fits the track. 12 presets total covering streaming, hip hop, EDM, pop, rock, lo-fi, R&B, acoustic, cinematic, podcast, gentle, and loud-and-punchy mastering styles. Each preset has a target LUFS value (e.g. -14 for streaming, -9 for loud) so you can match the user's distribution target.

aetherwave_list_video_modelsA

Returns every video-generation model AetherWave supports (Grok Imagine, Wan 2.7, Hailuo 02, Seedance Pro/Lite, Kling 2.6 with audio, VEO 3.1, Happy Horse, etc.) with per-second credit cost, supported durations, resolutions, aspect ratios, and whether the model needs an input image (I2V). Call this before generate_video when you don't know the right model ID.

aetherwave_generate_imageA

Generates one or more images from a text prompt (T2I) or a text prompt + reference image(s) (I2I). Submits the job, polls until terminal, and returns the final image URLs. Default model is 'grok-imagine-t2i' (fast, 6 images per generation, 5 credits). Use list_image_models to see the full lineup with pricing. For I2I, pass referenceImages as an array of public image URLs and pick a model with I2I support (e.g. 'grok-imagine-i2i', 'wan-2.5-spicy-i2i').

Model selection guide (when the user does not specify a model)

Default: grok-imagine-t2i (5 cr, 6 outputs per call, fast, general purpose).

Strong recommendation: when a single high-quality output is what's wanted (most agent / one-shot workflows), prefer gpt-image-2-t2i (9 cr @ 1K / higher @ 2K, single deterministic image, best general quality across realism, illustration, typography, and composition; supports up to 2K resolution and most aspect ratios including auto). This is the front-runner for serious creative output where you don't need to pick from 6 variations.

Pick a different model when the prompt has these signals:

  • "single best result" / "one image" / production / no time to pick from variations -> gpt-image-2-t2i (9 cr, 1 output, top general quality)

  • "photoreal" / "photo of" / "realistic" -> gpt-image-2-t2i (9 cr, best general realism) or imagen-4 (12 cr, very high quality) or z-image-turbo (3 cr, fastest)

  • "highest quality" / "premium" / no budget -> gpt-image-2-t2i at 2K, or grok-imagine-quality-t2i (16 cr @ 1K, 22 cr @ 2K), or imagen-4-ultra

  • Text inside the image (signs, posters, typography) -> ideogram-v3-t2i (best in class) or gpt-image-2-t2i (also strong)

  • Artistic / painterly / stylized -> midjourney-t2i

  • Album art / cover art -> gpt-image-2-t2i for one strong image; grok-imagine-t2i for 6 variations to choose from; seedream-v4-t2i if 4K wanted

  • Logo or design with embedded text -> ideogram-v3-t2i

  • NSFW / adult / explicit -> wan-2.5-spicy-t2i (auto-tags creation as 18+; routes to adult gallery)

  • Cheapest possible / quick test -> z-image-turbo (3 cr)

  • Multiple variations to compare -> keep grok-imagine-t2i (6 outputs default) or use numImages on a multi-output model

For I2I (reference image provided): prefer the dedicated aetherwave_edit_image tool for "change something in this image" intent. Use aetherwave_generate_image with I2I models only when you specifically want style transfer (midjourney-i2i), premium quality (grok-imagine-quality-i2i), or adult content (wan-2.5-spicy-i2i).

Always pass an explicit aspectRatio (e.g. "1:1" for square album art, "16:9" for video thumbnails, "9:16" for shorts/reels). Some upstream providers reject submissions with no aspect ratio.

Ask the user only when:

  • The prompt contradicts itself (e.g., "highest quality but cheapest")

  • The user requested "the best model" with no context, surface 2-3 options with tradeoffs

  • A single generation would cost more than 20 credits and the user has not confirmed

aetherwave_edit_imageA

Edits an existing image guided by a text prompt. Pass a public imageUrl plus a prompt describing the change ("add a moon to the sky", "swap the background for a neon city", "make it look like a comic panel"). Submits, polls, and returns the edited image URL(s). Default model is 'grok-imagine-i2i' (6 cr per call, returns 2 variations, ~30s, best cost-to-quality on standard edits). Other I2I-capable models: 'seedream-v4-edit', 'wan-2.5-spicy-i2i', 'flux-kontext-pro', 'qwen-image-edit', 'gpt-image-1.5-i2i' (slow, ~5min). Use list_image_models for full lineup. Note: source URLs with spaces or parentheses may fail upstream; prefer clean URLs.

Model selection guide for edits

Default: grok-imagine-i2i (6 cr per call, returns 2 variations = 3 cr/image effective, fast ~30s, strong general-purpose edit quality).

Pick a different model when:

  • Need a single deterministic output, or 4K resolution -> seedream-v4-edit (7 cr per image, supports 1K/2K/4K, multi-image up to 6)

  • Subtle edits / preserve composition / character consistency -> flux-kontext-pro or flux-kontext-max

  • NSFW edits -> wan-2.5-spicy-i2i

  • Highest quality, time is not a concern (~5 min OK) -> gpt-image-1.5-i2i or grok-imagine-quality-i2i (16 cr @ 1K, 22 cr @ 2K)

  • Stylized / artistic transformation -> midjourney-i2i

If the user simply says "edit this image" with no other signal, default to grok-imagine-i2i.

aetherwave_upscale_imageA

Upscales a source image using Topaz's high-fidelity upscaler. Pass a public imageUrl and an upscaleFactor. Credit cost depends on the source resolution × factor; small images cost less than large ones at the same factor. Returns the upscaled image URL.

aetherwave_remove_backgroundA

Strips the background from an image, returning a PNG with transparent alpha. Pass a public imageUrl. Useful for product shots, character cutouts, logo isolation, or compositing onto a new background. ~5 credits per image. Recraft is the primary provider; on outage the tool auto-falls back to fal.ai BiRefNet v2 so single-image calls never silently fail. Works best on photographic subjects (people, products, animals); transparent-PNG inputs have no foreground to segment.

aetherwave_reframe_imageA

Reframes an image to a new aspect ratio by intelligently outpainting the edges. Pass a public imageUrl and the target aspectRatio ('16:9', '9:16', '1:1', '4:3', '3:4', etc.). Three speed tiers: 'turbo' (5 cr, fast), 'balanced' (10 cr, default), 'quality' (14 cr, slowest, best edges). Returns the reframed image URL.

aetherwave_upscale_videoA

Upscales a source video to 1080p or 2K using Atlas. Pass a public videoUrl and the target resolution. Cost is per-second (7 cr/s @ 1080p, 9 cr/s @ 2K). Atlas-side limits: clips up to 53s at 1080p, 23s at 2K, source must be <=30fps. Returns the upscaled video URL (R2-hosted).

aetherwave_remove_background_videoA

Strips the background from a video frame-by-frame using rembg (u2netp) on AetherWave's Python service. Pass a public videoUrl. Choose bgType: "transparent" for an alpha-channel WebM output (compositing) or bgType: "color" with a customColor hex for a solid replacement. 2 credits per second. Slowest tool in the surface (per-frame processing); a 6s clip takes ~4 min, a 30s clip ~15-20 min. Works best on subjects with clear edges (people, products). Returns the processed video URL (R2-hosted).

aetherwave_reframe_videoA

Reframes a video to a new aspect ratio by intelligently outpainting/cropping the edges. Pass a public videoUrl and target reframeAspectRatio. 17 credits per second. Optional reframePrompt lets you steer the new edge content (e.g. 'extend the sky with sunset clouds'). Returns the reframed video URL (R2-hosted).

aetherwave_generate_videoA

Generates a short-form video from a text prompt (T2V) or a text prompt + starting image (I2V). Submits, polls, and returns the final video URL. Default model is 'grok-imagine-t2v' (fast, 4-6 cr/s, with built-in KIE -> fal.ai fallback). Use list_video_models for the full lineup with credit cost per second. I2V models (e.g. 'grok-imagine-i2v', 'seedance-pro-i2v') require a public imageUrl. Video generation can take 30s to several minutes; this tool polls with up to an 8-minute budget.

Model selection guide for videos (when the user does not specify a model)

Default: grok-imagine-t2v (4-6 cr/s, fast, has KIE -> fal.ai fallback for redundancy. Best general-purpose).

Pick a different model when the prompt has these signals:

  • "highest quality" / "premium" / broadcast / commercial -> veo3.1-quality or veo3-quality (Google's flagship, fixed 350-560 cr for 8s, 3-5 min)

  • "fast premium" / quick high-quality -> veo3-fast or veo3.1-fast (84 cr fixed for 8s)

  • Cinematic camera moves / dolly / pan -> seedance-pro-t2v (3-10 cr/s) or kling-3.0-pro-t2v (26 cr/s)

  • Realistic human motion / faces -> hailuo-2.3-pro-i2v (I2V, supply imageUrl)

  • Talking head / lip sync -> kling-avatar-pro (23 cr/s) or infinitalk (5-17 cr/s)

  • Anime / stylized / fantasy -> wan-2.7-t2v

  • NSFW / adult -> wan-22-nsfw-i2v (I2V only; auto-tags adult)

  • Animate this exact image -> any I2V variant (grok-imagine-i2v, seedance-pro-i2v, hailuo-2.3-pro-i2v)

  • First + last frame interpolation -> seedance-pro-i2v with both imageUrl + endImageUrl

  • Cheapest test -> hailuo-2.0-standard @ 512p (3 cr/s, ~18 cr for 6s) or grok-imagine-t2v @ 480p (4 cr/s, ~24 cr for 6s)

  • Clip 12-15s -> grok-imagine-t2v (accepts up to 15s)

  • True 4K -> kling-3.0-4k-t2v (94 cr/s, expensive but native 4K)

Audio in generated video: grok-imagine-t2v, seedance-pro-t2v, and the VEO 3.x family include audio at base cost (no surcharge). Kling 2.6 and Kling 3.0 are the outliers — they price audio as a +50-100% surcharge (Kling 2.6 doubles the cost, Kling 3.0 Pro adds ~46%). Default to Grok / Seedance / VEO when sound matters and you don't want to think about audio pricing.

Cost framing: resolution and duration drive cost more than model choice. A 6-second 480p Grok generation costs ~24 cr; the same prompt at 1080p Seedance 2 is ~858 cr (35x more). Pick the lowest acceptable resolution + duration first.

For I2V models: imageUrl is required. For first+last-frame models, pass endImageUrl too.

Ask the user only when:

  • Single generation would cost more than 100 credits and they haven't confirmed

  • They asked for "the best" with no other signal; surface 2-3 options with cost ranges

aetherwave_generate_musicA

Generates AI music via Suno. Returns two tracks per submission. Default model is V5.5 (newest, best quality). For instrumental output set instrumental: true. Music gen typically takes 30-90s - this tool polls with up to a 6-minute budget. Note: the title param is advisory for instrumentals - Suno often writes its own title from the prompt content for instrumental generations. Transient GENERATE_AUDIO_FAILED errors are common; retry once before degrading the model version.

aetherwave_master_audioA

Submits an audio file for AI mastering and returns the mastered URL synchronously (route polls the Python service internally; expect 30s-5min). Useful as a final polish step after music generation. Cost: 20 credits per track. Producer, Mogul, and Ultimate plans get mastering free. Output is WAV (~50MB per 3-minute track, lossless for redistribution). Pick a preset to steer the mastering style; call aetherwave_list_master_presets for the full live list (12 presets including streaming, loud, gentle, hip_hop, edm, pop, rock, lofi, rnb, acoustic, cinematic, podcast). Each preset has a target LUFS value so you can match the distribution target.

aetherwave_list_my_creationsA

Returns items from the authenticated user's gallery — images, videos, audio tracks they've generated on AetherWave. Useful for agent workflows like 'find my last 5 images and reframe them all to 9:16' or 'list my recent songs and master each one'. Supports pagination and type filtering. Each item includes id, type, prompt, model, contentUrl, thumbnailUrl, createdAt, isFavorite, visibility, rating, and type-specific fields (duration for audio/video, width/height for images).

Prompts

Interactive templates invoked by user choice

NameDescription

No prompts

Resources

Contextual data attached and managed by the client

NameDescription

No resources

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/AetherWave-Studio/aetherwave-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server