Skip to main content
Glama

Server Details

One tool surface for music, image, video, and audio generation across Suno, Grok Imagine, Seedance, Kling, Hailuo, Wan, VEO, Ideogram, and GPT Image 2. Generate, edit, upscale, reframe, and master through one credit pool. Connect in one click with OAuth, no API key required.

Status
Healthy
Last Tested
Transport
Streamable HTTP
URL

Glama MCP Gateway

Connect through Glama MCP Gateway for full control over tool access and complete visibility into every call.

MCP client
Glama
MCP server

Full call logging

Every tool call is logged with complete inputs and outputs, so you can debug issues and audit what your agents are doing.

Tool access control

Enable or disable individual tools per connector, so you decide what your agents can and cannot do.

Managed credentials

Glama handles OAuth flows, token storage, and automatic rotation, so credentials never expire on your clients.

Usage analytics

See which tools your agents call, how often, and when, so you can understand usage patterns and catch anomalies.

100% free. Your data is private.
Tool DescriptionsA

Average 4.6/5 across 16 of 16 tools scored.

Server CoherenceA
Disambiguation5/5

Each tool targets a distinct media operation (image, video, audio, listing, mastering, etc.) with clear boundaries. Even similar tools like generate_image and edit_image are differentiated by their primary intent (creation vs. modification) and model selection guidance.

Naming Consistency5/5

All tools follow the 'aetherwave_verb_noun' pattern consistently, using snake_case. Verbs and nouns are descriptive and predictable (e.g., generate_image, list_video_models, remove_background_video).

Tool Count4/5

16 tools is slightly above the ideal range (3-15) but remains well-scoped for a multimedia generation platform covering image, video, audio, and user management. Each tool serves a distinct purpose, and no obvious bloat exists.

Completeness4/5

The tool surface covers core creation, editing, listing, and enhancement workflows for images, videos, and audio. Minor gaps exist (e.g., no delete tool, no get-single-creation tool), but the essential lifecycle is well-covered.

Available Tools

16 tools
aetherwave_balanceCheck credit balanceA
Read-only
Inspect

Returns the current AetherWave credit balance for the API key. Use this BEFORE a generation to confirm sufficient credits, especially for video which can cost 30-300+ credits depending on model/duration/resolution.

ParametersJSON Schema
NameRequiredDescriptionDefault

No parameters

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and openWorldHint=true, so the description is not required to cover safety. It adds value by specifying that the balance is tied to the API key and explaining the credit cost range for video, but does not elaborate on other behavioral aspects like rate limits or caching.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences with zero wasted words. The first sentence states the core functionality, and the second provides actionable usage guidance. Information is front-loaded.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple read-only balance check with no parameters and no output schema, the description is fully complete. It covers what the tool returns, its purpose, and when to use it, integrating cost context for better decision-making.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The tool has no parameters, and schema description coverage is 100%. Per scoring guidelines, baseline is 3. The description does not add parameter-specific information but provides context on when to use the tool.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states 'Returns the current AetherWave credit balance for the API key.' This is a specific verb and resource, and it distinguishes from sibling generation tools by implying it is a prerequisite check.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly advises to use this tool BEFORE a generation to confirm sufficient credits, especially for video generation which can vary in cost. This provides clear context and when-to-use guidance.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

aetherwave_edit_imageEdit image with AI (I2I)AInspect

Edits an existing image guided by a text prompt. Pass a public imageUrl plus a prompt describing the change ("add a moon to the sky", "swap the background for a neon city", "make it look like a comic panel"). Submits, polls, and returns the edited image URL(s). Default model is 'grok-imagine-i2i' (6 cr per call, returns 2 variations, ~30s, best cost-to-quality on standard edits). Other I2I-capable models: 'seedream-v4-edit', 'wan-2.5-spicy-i2i', 'flux-kontext-pro', 'qwen-image-edit', 'gpt-image-1.5-i2i' (slow, ~5min). Use list_image_models for full lineup. Note: source URLs with spaces or parentheses may fail upstream; prefer clean URLs.

Model selection guide for edits

Default: grok-imagine-i2i (6 cr per call, returns 2 variations = 3 cr/image effective, fast ~30s, strong general-purpose edit quality).

Pick a different model when:

  • Need a single deterministic output, or 4K resolution -> seedream-v4-edit (7 cr per image, supports 1K/2K/4K, multi-image up to 6)

  • Subtle edits / preserve composition / character consistency -> flux-kontext-pro or flux-kontext-max

  • NSFW edits -> wan-2.5-spicy-i2i

  • Highest quality, time is not a concern (~5 min OK) -> gpt-image-1.5-i2i or grok-imagine-quality-i2i (16 cr @ 1K, 22 cr @ 2K)

  • Stylized / artistic transformation -> midjourney-i2i

If the user simply says "edit this image" with no other signal, default to grok-imagine-i2i.

ParametersJSON Schema
NameRequiredDescriptionDefault
modelNoModel ID. Defaults to 'grok-imagine-i2i' (3 cr/image effective, 2 outputs). Other options: 'seedream-v4-edit', 'wan-2.5-spicy-i2i', 'flux-kontext-pro', 'qwen-image-edit', 'gpt-image-1.5-i2i', 'grok-imagine-quality-i2i'. Use list_image_models for the full list.
promptYesText description of the edit (e.g. 'replace the sky with sunset clouds').
qualityNoQuality preset for models that support it (e.g. GPT Image 2).
imageUrlYesPublic URL of the source image to edit. Must be a real, fetchable URL.
maxImagesNoNumber of variations to return for multi-output models.
resolutionNoOutput resolution. Tiered-pricing models accept '1K' / '2K'.
aspectRatioNoOutput aspect ratio (e.g. '1:1', '16:9'). Defaults to the source ratio for most models.
renderingSpeedNoRendering speed preset for models that support it.
negative_promptNoWhat to avoid in the output (supported by some models).
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Discloses submission and polling behavior, return format (URL(s)), cost (6 cr per call), timing (~30s default), and model-specific behaviors (e.g., slow ~5min for gpt-image-1.5-i2i). Warns about URL pitfalls. No annotation contradiction.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Well-structured with front-loaded purpose and detailed guide; slightly verbose but every section adds essential context. Model selection table is excellent but could be trimmed.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a complex tool with 9 parameters and no output schema, description fully covers return format, polling, edge cases, and model-specific behaviors. No gaps.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, but description adds value by explaining default model behavior, prompt examples, and specific parameter contexts like resolution tiers and maxImages for multi-output models.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description uses specific verb 'edits' and resource 'existing image', with example prompts. Distinguishes from siblings like aetherwave_generate_image by focusing on image-to-image editing.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Contains an explicit model selection guide covering various use cases (single output, NSFW, high quality), default recommendation, and when to use alternatives. Mentions sibling tool list_image_models for full lineup.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

aetherwave_generate_imageGenerate image (Grok Imagine, GPT Image 2, Seedream V4, Wan, Imagen 4, Nano Banana, Ideogram V3, Z-Image Turbo)AInspect

Generates one or more images from a text prompt (T2I) or a text prompt + reference image(s) (I2I). Submits the job, polls until terminal, and returns the final image URLs. Default model is 'grok-imagine-t2i' (fast, 6 images per generation, 5 credits). Use list_image_models to see the full lineup with pricing. For I2I, pass referenceImages as an array of public image URLs and pick a model with I2I support (e.g. 'grok-imagine-i2i', 'wan-2.5-spicy-i2i').

Model selection guide (when the user does not specify a model)

Default: grok-imagine-t2i (5 cr, 6 outputs per call, fast, general purpose).

Strong recommendation: when a single high-quality output is what's wanted (most agent / one-shot workflows), prefer gpt-image-2-t2i (9 cr @ 1K / higher @ 2K, single deterministic image, best general quality across realism, illustration, typography, and composition; supports up to 2K resolution and most aspect ratios including auto). This is the front-runner for serious creative output where you don't need to pick from 6 variations.

Pick a different model when the prompt has these signals:

  • "single best result" / "one image" / production / no time to pick from variations -> gpt-image-2-t2i (9 cr, 1 output, top general quality)

  • "photoreal" / "photo of" / "realistic" -> gpt-image-2-t2i (9 cr, best general realism) or imagen-4 (12 cr, very high quality) or z-image-turbo (3 cr, fastest)

  • "highest quality" / "premium" / no budget -> gpt-image-2-t2i at 2K, or grok-imagine-quality-t2i (16 cr @ 1K, 22 cr @ 2K), or imagen-4-ultra

  • Text inside the image (signs, posters, typography) -> ideogram-v3-t2i (best in class) or gpt-image-2-t2i (also strong)

  • Artistic / painterly / stylized -> midjourney-t2i

  • Album art / cover art -> gpt-image-2-t2i for one strong image; grok-imagine-t2i for 6 variations to choose from; seedream-v4-t2i if 4K wanted

  • Logo or design with embedded text -> ideogram-v3-t2i

  • NSFW / adult / explicit -> wan-2.5-spicy-t2i (auto-tags creation as 18+; routes to adult gallery)

  • Cheapest possible / quick test -> z-image-turbo (3 cr)

  • Multiple variations to compare -> keep grok-imagine-t2i (6 outputs default) or use numImages on a multi-output model

For I2I (reference image provided): prefer the dedicated aetherwave_edit_image tool for "change something in this image" intent. Use aetherwave_generate_image with I2I models only when you specifically want style transfer (midjourney-i2i), premium quality (grok-imagine-quality-i2i), or adult content (wan-2.5-spicy-i2i).

Always pass an explicit aspectRatio (e.g. "1:1" for square album art, "16:9" for video thumbnails, "9:16" for shorts/reels). Some upstream providers reject submissions with no aspect ratio.

Ask the user only when:

  • The prompt contradicts itself (e.g., "highest quality but cheapest")

  • The user requested "the best model" with no context, surface 2-3 options with tradeoffs

  • A single generation would cost more than 20 credits and the user has not confirmed

ParametersJSON Schema
NameRequiredDescriptionDefault
seedNoSeed for deterministic generation (supported by some models).
modelNoModel ID. Defaults to 'grok-imagine-t2i'. Use list_image_models for the full list.
promptYesText description of the image to generate.
numImagesNoNumber of images for models that support multiple outputs.
resolutionNoOutput resolution. Most models accept '1K' or '2K'; some accept '480p'/'720p'.
aspectRatioNoAspect ratio (e.g. '1:1', '16:9', '9:16'). Pass this explicitly when possible; some upstream providers reject submissions without an aspect ratio. Default ratios vary by model.
negative_promptNoWhat to avoid in the output (supported by some models).
referenceImagesNoArray of public image URLs for image-to-image generation. Required when using an I2I model. A single URL string is also accepted (wrapped as a one-element array).
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Covers process: submits job, polls until terminal, returns final image URLs. Discloses credit costs per model, default model behavior, and model-specific behaviors like multi-output vs single deterministic. Annotations (readOnlyHint=false, destructiveHint=false) are supplemented with full async workflow and mutation context.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Description is long but well-structured with sections (Model selection guide, I2I guidance, when to ask user). Every sentence adds value, but some redundancy could be trimmed (e.g., repeating default model). Still, clarity justifies length.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given 8 parameters, no output schema, and complex model ecosystem, the description is very complete. It explains process, costs, model selection, parameters, and edge cases. Output is described as final image URLs. No gaps.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema covers 100% of parameters with basic descriptions. The tool description adds significant meaning: explains default model, how to use referenceImages (array or single URL), which models support I2I, negative_prompt, seed, etc. The model selection guide ties parameters to use cases.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'Generates one or more images from a text prompt (T2I) or a text prompt + reference image(s) (I2I).' It distinguishes from sibling aetherwave_edit_image for 'change something' intent and mentions list_image_models for model lineup. The verb+resource+scope is very specific.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly guides when to use this tool vs alternatives: For I2I, prefer aetherwave_edit_image unless specific style transfer, premium, or adult content needed. Provides a comprehensive model selection guide with signals like 'high quality', 'photoreal', 'text inside image', etc. Also advises when to ask user for clarification.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

aetherwave_generate_musicGenerate music (Suno)AInspect

Generates AI music via Suno. Returns two tracks per submission. Default model is V5.5 (newest, best quality). For instrumental output set instrumental: true. Music gen typically takes 30-90s - this tool polls with up to a 6-minute budget. Note: the title param is advisory for instrumentals - Suno often writes its own title from the prompt content for instrumental generations. Transient GENERATE_AUDIO_FAILED errors are common; retry once before degrading the model version.

ParametersJSON Schema
NameRequiredDescriptionDefault
modelNoSuno model version. Defaults to V5_5 (current best).
titleNoOptional title for the generated tracks.
lyricsNoCustom lyrics. If omitted, Suno will generate lyrics from the prompt (unless instrumental=true).
promptYesStyle/mood/topic description. E.g. 'Lo-fi ambient track, rain sounds, warm pads' or 'High-energy synthwave with driving bass'.
instrumentalNoIf true, no vocals. Default false.
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Adds significant behavioral context beyond annotations: returns two tracks, polling mechanism, common transient errors with retry advice, model version behavior. No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Dense single paragraph with all key information, though could benefit from bullet points for improved scannability. Still efficient and front-loaded.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Lacks return value details despite no output schema. Mentions 'returns two tracks' but doesn't specify format (URLs, IDs). Polling is described but not how results are delivered. Slight gap in completeness.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, but description adds crucial context: instrumental activation, title advisory for instrumentals, lyrics generation fallback, model default, and prompt examples. This meaningfully supplements the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states 'Generates AI music via Suno' with a specific verb and resource. Distinguished from sibling tools like aetherwave_generate_image and aetherwave_generate_video by focusing on music generation.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides detailed guidance on when to use: default model, instrumental option, timing and polling budget, error handling. While no explicit alternatives are named, the context of music generation vs other tools is clear.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

aetherwave_generate_videoGenerate video (Grok Imagine, Wan 2.7, Hailuo 02, Seedance, Kling 2.6, VEO 3.1, Happy Horse)AInspect

Generates a short-form video from a text prompt (T2V) or a text prompt + starting image (I2V). Submits, polls, and returns the final video URL. Default model is 'grok-imagine-t2v' (fast, 4-6 cr/s, with built-in KIE -> fal.ai fallback). Use list_video_models for the full lineup with credit cost per second. I2V models (e.g. 'grok-imagine-i2v', 'seedance-pro-i2v') require a public imageUrl. Video generation can take 30s to several minutes; this tool polls with up to an 8-minute budget.

Model selection guide for videos (when the user does not specify a model)

Default: grok-imagine-t2v (4-6 cr/s, fast, has KIE -> fal.ai fallback for redundancy. Best general-purpose).

Pick a different model when the prompt has these signals:

  • "highest quality" / "premium" / broadcast / commercial -> veo3.1-quality or veo3-quality (Google's flagship, fixed 350-560 cr for 8s, 3-5 min)

  • "fast premium" / quick high-quality -> veo3-fast or veo3.1-fast (84 cr fixed for 8s)

  • Cinematic camera moves / dolly / pan -> seedance-pro-t2v (3-10 cr/s) or kling-3.0-pro-t2v (26 cr/s)

  • Realistic human motion / faces -> hailuo-2.3-pro-i2v (I2V, supply imageUrl)

  • Talking head / lip sync -> kling-avatar-pro (23 cr/s) or infinitalk (5-17 cr/s)

  • Anime / stylized / fantasy -> wan-2.7-t2v

  • NSFW / adult -> wan-22-nsfw-i2v (I2V only; auto-tags adult)

  • Animate this exact image -> any I2V variant (grok-imagine-i2v, seedance-pro-i2v, hailuo-2.3-pro-i2v)

  • First + last frame interpolation -> seedance-pro-i2v with both imageUrl + endImageUrl

  • Cheapest test -> hailuo-2.0-standard @ 512p (3 cr/s, ~18 cr for 6s) or grok-imagine-t2v @ 480p (4 cr/s, ~24 cr for 6s)

  • Clip 12-15s -> grok-imagine-t2v (accepts up to 15s)

  • True 4K -> kling-3.0-4k-t2v (94 cr/s, expensive but native 4K)

Audio in generated video: grok-imagine-t2v, seedance-pro-t2v, and the VEO 3.x family include audio at base cost (no surcharge). Kling 2.6 and Kling 3.0 are the outliers — they price audio as a +50-100% surcharge (Kling 2.6 doubles the cost, Kling 3.0 Pro adds ~46%). Default to Grok / Seedance / VEO when sound matters and you don't want to think about audio pricing.

Cost framing: resolution and duration drive cost more than model choice. A 6-second 480p Grok generation costs ~24 cr; the same prompt at 1080p Seedance 2 is ~858 cr (35x more). Pick the lowest acceptable resolution + duration first.

For I2V models: imageUrl is required. For first+last-frame models, pass endImageUrl too.

Ask the user only when:

  • Single generation would cost more than 100 credits and they haven't confirmed

  • They asked for "the best" with no other signal; surface 2-3 options with cost ranges

ParametersJSON Schema
NameRequiredDescriptionDefault
modeNoModeration mode for Grok Imagine. Defaults to 'normal'.
modelNoModel ID. Defaults to 'grok-imagine-t2v'. Use list_video_models for the full list.
promptYesText description of the video scene.
durationNoDuration in seconds. Grok Imagine accepts 6-15; other models have their own ranges (see list_video_models).
imageUrlNoPublic URL of starting image. Required for I2V models.
resolutionNoOutput resolution. Default depends on model.
aspectRatioNoAspect ratio (e.g. '16:9', '9:16', '1:1').
endImageUrlNoPublic URL of ending image. Supported by some I2V models (first+last frame).
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Beyond annotations (readOnlyHint=false, openWorldHint=true), the description details polling behavior with an 8-minute budget, default model fallback (KIE -> fal.ai), cost ranges per model, audio inclusion, and resolution-driven cost differences. No contradictions with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is long (multiple sections and bullet points) but well-structured and front-loaded with the core purpose. Each sentence serves a purpose given the tool's complexity. Could be slightly shortened, but clarity is not sacrificed.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a tool with 8 parameters, no output schema, and minimally informative annotations, the description is remarkably complete. It covers model selection, cost, duration, audio, resolution, I2V requirements, NSFW handling, and first+last frame interpolation. It leaves almost no ambiguity.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, but the description adds substantial value: for 'model' it explains defaults and fallback; for 'duration' it specifies model-specific ranges; for 'resolution' it links to cost; for 'imageUrl' it clarifies I2V requirement; for 'mode' it mentions moderation. It also provides practical constraints (e.g., max 15s for Grok).

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool generates a short-form video from text or text+image, including multiple models. It distinguishes itself from sibling tools like list_video_models by specifying that this tool submits, polls, and returns the final video URL.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides an exhaustive model selection guide with explicit signals (e.g., highest quality, cinematic camera moves, cheapest test). It instructs when to ask the user (e.g., single generation >100 credits) and when not to (e.g., default to grok-imagine-t2v). It also covers cost framing and audio pricing.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

aetherwave_list_image_modelsList available image modelsA
Read-only
Inspect

Returns every image-generation model AetherWave supports, with its credit cost, default aspect ratio, supported inputs (T2I vs I2I), and any model-specific options. Call this before generate_image when you don't know the right model ID. The model key (e.g. 'grok-imagine-t2i') is what you pass as model to generate_image.

ParametersJSON Schema
NameRequiredDescriptionDefault

No parameters

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate read-only and open-world; description adds beyond by listing returned fields (credit cost, aspect ratio, etc.) and noting the model key format.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, every word earns its place. Front-loaded with purpose and immediate usage context.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Despite no output schema, description lists all relevant fields and cross-references generate_image, fully equipping the agent.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

No parameters exist, so description need not explain them. Baseline 4 for zero-param tools.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool returns every image-generation model with details like credit cost, aspect ratio, supported inputs, and options. It is distinct from siblings like aetherwave_list_video_models.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly advises calling this before generate_image when model ID is unknown, providing clear when-to-use and how-to-follow-up guidance.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

aetherwave_list_master_presetsList available audio mastering presetsA
Read-only
Inspect

Returns every AI mastering preset AetherWave supports, with target LUFS, tags, descriptions, and difficulty level. Call this before master_audio when you don't know which preset fits the track. 12 presets total covering streaming, hip hop, EDM, pop, rock, lo-fi, R&B, acoustic, cinematic, podcast, gentle, and loud-and-punchy mastering styles. Each preset has a target LUFS value (e.g. -14 for streaming, -9 for loud) so you can match the user's distribution target.

ParametersJSON Schema
NameRequiredDescriptionDefault

No parameters

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and openWorldHint=true, indicating no destructive side effects and no exhaustive list. The description adds value by specifying that all 12 presets are returned with their attributes (e.g., target LUFS for streaming is -14), which goes beyond the annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise with three sentences: first sentence states purpose and output, second sentence provides usage guidance, third sentence gives concrete details. No wasted words, front-loaded with key information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given 0 parameters, no output schema, and annotations providing safety context, the description is complete. It covers the tool's purpose, when to use it, and what to expect (12 presets with relevant metadata). No obvious gaps.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

There are 0 parameters, and the schema coverage is 100% (empty properties). The description implicitly covers the parameter semantics by indicating no input is needed. According to guidelines, baseline is 4 for 0 params, and the description is adequate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'Returns every AI mastering preset AetherWave supports' with specific attributes (LUFS, tags, etc.). It distinguishes from the sibling tool 'aetherwave_master_audio' by explicitly recommending to call this first when uncertain about the preset.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly advises: 'Call this before master_audio when you don't know which preset fits the track.' This gives clear context for when to use the tool versus alternatives, and implies that if the preset is known, this tool may be skipped.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

aetherwave_list_my_creationsList my AetherWave gallery itemsA
Read-only
Inspect

Returns items from the authenticated user's gallery — images, videos, audio tracks they've generated on AetherWave. Useful for agent workflows like 'find my last 5 images and reframe them all to 9:16' or 'list my recent songs and master each one'. Supports pagination and type filtering. Each item includes id, type, prompt, model, contentUrl, thumbnailUrl, createdAt, isFavorite, visibility, rating, and type-specific fields (duration for audio/video, width/height for images).

ParametersJSON Schema
NameRequiredDescriptionDefault
typeNoFilter to a single media type. Omit for all types.
limitNoMax items to return. Defaults to 100, max 500.
offsetNoPagination offset. Defaults to 0.
favoritesOnlyNoIf true, only return items marked as favorite.
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and openWorldHint=true. The description adds valuable behavioral details: it lists the exact fields returned (id, type, prompt, model, etc.), mentions pagination and type filtering, and notes type-specific fields. This goes beyond the annotations to inform the agent about the output structure and capabilities.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is three sentences long, front-loaded with the core purpose, followed by usage examples, then a detailed list of returned fields and capabilities. Every sentence adds value without redundancy or filler.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a listing tool with 4 optional parameters and no output schema, the description adequately covers purpose, typical workflows, pagination, field details, and type-specific attributes. It provides enough context for an agent to understand what data to expect and how to use the parameters effectively.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% (all 4 parameters described in schema). The description mentions pagination and type filtering, which aligns with the parameters, but does not add significant semantic depth beyond the schema. The use case examples imply parameter usage but are not explicit about parameter values. Baseline of 3 is appropriate given schema completeness.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description uses a specific verb ('Returns') and resource ('items from the authenticated user's gallery'), clearly distinguishing from sibling tools which are about generation, editing, and other operations. It lists concrete subtypes (images, videos, audio tracks), making the tool's scope unambiguous.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit use cases like 'find my last 5 images and reframe them' and 'list my recent songs and master each one', which illustrate when to use this tool. It mentions support for pagination and type filtering. However, it does not explicitly state when not to use it or direct to alternatives, though the sibling tool names imply the differences.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

aetherwave_list_video_modelsList available video modelsA
Read-only
Inspect

Returns every video-generation model AetherWave supports (Grok Imagine, Wan 2.7, Hailuo 02, Seedance Pro/Lite, Kling 2.6 with audio, VEO 3.1, Happy Horse, etc.) with per-second credit cost, supported durations, resolutions, aspect ratios, and whether the model needs an input image (I2V). Call this before generate_video when you don't know the right model ID.

ParametersJSON Schema
NameRequiredDescriptionDefault

No parameters

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint and openWorldHint. The description adds valuable behavioral context about the returned data (cost, durations, resolutions, I2V flag), enhancing transparency beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, front-loaded with crucial details, no filler. Every sentence earns its place.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

No output schema, but the description enumerates returned fields comprehensively. For a listing tool with no parameters, this covers the context adequately.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

No parameters exist, and schema coverage is 100%. The description adds no param info, which is appropriate. Baseline 4 for zero parameters.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it returns every video-generation model with detailed info, naming specific models and attributes. It distinguishes from sibling tools like list_image_models by focusing on video models.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly states 'Call this before generate_video when you don't know the right model ID,' providing clear context for use. Lacks explicit when-not-to-use, but the guidance is sufficient.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

aetherwave_master_audioMaster an audio track (AI mastering)AInspect

Submits an audio file for AI mastering and returns the mastered URL synchronously (route polls the Python service internally; expect 30s-5min). Useful as a final polish step after music generation. Cost: 20 credits per track. Producer, Mogul, and Ultimate plans get mastering free. Output is WAV (~50MB per 3-minute track, lossless for redistribution). Pick a preset to steer the mastering style; call aetherwave_list_master_presets for the full live list (12 presets including streaming, loud, gentle, hip_hop, edm, pop, rock, lofi, rnb, acoustic, cinematic, podcast). Each preset has a target LUFS value so you can match the distribution target.

ParametersJSON Schema
NameRequiredDescriptionDefault
presetYesMastering preset name. Must be one of: 'streaming', 'loud', 'gentle', 'hip_hop', 'edm', 'pop', 'rock', 'lofi', 'rnb', 'acoustic', 'cinematic', 'podcast'. Call aetherwave_list_master_presets for full metadata (target LUFS, description, tags).
audioUrlYesPublic URL to the source audio file (MP3 or WAV).
trackTitleNoOptional title for the mastered output (used in gallery row label).
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Discloses synchronous polling (30s-5min), cost, free plans, output format (WAV, ~50MB), and preset steering. Adds far beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Long but each sentence adds value. First sentence captures core purpose. Could be slightly tighter but not verbose.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Despite no output schema, covers timing, cost, plans, output details, and preset metadata. Highly complete for a complex tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%. Description adds preset list reference, audioUrl format, and optional trackTitle purpose. Could include more on trackTitle usage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states it submits audio for AI mastering and returns URL. Distinguishes as final polish after music generation, and references sibling tool for preset list.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Specifies when to use (final polish), cost, and plan details. Does not explicitly state when not to use but provides enough context for decision.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

aetherwave_reframe_imageReframe image to a new aspect ratio (Ideogram V3 Reframe)AInspect

Reframes an image to a new aspect ratio by intelligently outpainting the edges. Pass a public imageUrl and the target aspectRatio ('16:9', '9:16', '1:1', '4:3', '3:4', etc.). Three speed tiers: 'turbo' (5 cr, fast), 'balanced' (10 cr, default), 'quality' (14 cr, slowest, best edges). Returns the reframed image URL.

ParametersJSON Schema
NameRequiredDescriptionDefault
speedNoRendering speed. 'turbo'=5cr, 'balanced'=10cr (default), 'quality'=14cr.
imageUrlYesPublic URL of the source image.
aspectRatioYesTarget aspect ratio (e.g. '16:9', '9:16', '1:1', '4:3', '3:4', '21:9').
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description adds significant context beyond annotations: it explains the outpainting approach, speed tiers and their costs, default behavior, and return value. No contradiction with annotations (readOnlyHint=false, openWorldHint=true, etc.).

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, front-loaded with purpose, then details. No wasted words. Efficient and easy to parse.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a tool with no output schema, it describes the return value (reframed image URL). It covers all key aspects: inputs, behavior, speed options, and output. Complete for the task.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Despite 100% schema description coverage, the description adds substantial value: explains speed tiers with cost, provides examples of aspect ratios, and mentions default speed. It enriches parameter understanding beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it reframes an image to a new aspect ratio via outpainting. The verb 'Reframes' and the resource 'image' are specific. It distinguishes from siblings like reframe_video, remove_background, and upscale_image.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description tells when to use (reframing an image to a new aspect ratio) and provides speed tiers with costs, aiding selection. It does not explicitly exclude alternatives or state when not to use, but sibling context helps.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

aetherwave_reframe_videoReframe video to a new aspect ratio (Luma Ray 2 Flash)AInspect

Reframes a video to a new aspect ratio by intelligently outpainting/cropping the edges. Pass a public videoUrl and target reframeAspectRatio. 17 credits per second. Optional reframePrompt lets you steer the new edge content (e.g. 'extend the sky with sunset clouds'). Returns the reframed video URL (R2-hosted).

ParametersJSON Schema
NameRequiredDescriptionDefault
videoUrlYesPublic URL of the source video (MP4).
reframePromptNoOptional prompt to steer the new edge content.
reframeAspectRatioYesTarget aspect ratio.
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate non-destructive (destructiveHint=false) and read-write (readOnlyHint=false). The description adds behavioral context: intelligent outpainting/cropping, credit cost, optional prompt to steer content, and return format (R2-hosted URL). No contradictions with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is three sentences: main action, required params, optional prompt and cost, and return. It is front-loaded with the core purpose, and every sentence adds value without redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (3 parameters, no output schema), the description covers purpose, inputs, cost, and output. It does not specify processing duration or error handling, but it provides enough for an AI agent to correctly invoke the tool. The openWorldHint suggests possible additional effects not described, but overall completeness is high.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with descriptions for each parameter. The description adds a usage example for reframePrompt ('extend the sky with sunset clouds'), which enhances meaning. For videoUrl and reframeAspectRatio, it merely restates the schema, adding little beyond the baseline.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb 'Reframes' and the resource 'video', and specifies the method ('intelligently outpainting/cropping'). It highlights the required inputs (videoUrl, reframeAspectRatio) and distinguishes from sibling tools like aetherwave_reframe_image.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description gives clear context: how to use (pass public videoUrl and target aspect ratio), what to expect (outpainting/cropping), and cost information (17 credits per second). However, it does not explicitly exclude alternative tools or provide when-not scenarios, but the sibling names provide implicit differentiation.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

aetherwave_remove_backgroundRemove background from image (Recraft + fal.ai BiRefNet v2 fallback)AInspect

Strips the background from an image, returning a PNG with transparent alpha. Pass a public imageUrl. Useful for product shots, character cutouts, logo isolation, or compositing onto a new background. ~5 credits per image. Recraft is the primary provider; on outage the tool auto-falls back to fal.ai BiRefNet v2 so single-image calls never silently fail. Works best on photographic subjects (people, products, animals); transparent-PNG inputs have no foreground to segment.

ParametersJSON Schema
NameRequiredDescriptionDefault
imageUrlYesPublic URL of the source image.
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description adds value beyond annotations by mentioning credit cost per image, auto-fallback on provider outage, and behavior with transparent inputs. Annotations indicate non-readOnly and non-idempotent, but description does not contradict them and adds useful context.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise with two sentences: first covers the core function, second adds key details (use cases, credits, fallback, limitations). No wasted words, front-loaded with essential information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's simplicity (single required param, no output schema), the description is sufficiently complete. It covers purpose, usage, behavior, and limitations. Could potentially mention accepted image formats, but not a critical gap.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with only imageUrl, and the schema already describes it as a URI. The description adds that it must be a public URL, but adds minimal extra meaning beyond the schema. Baseline score of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it strips background from an image and returns a transparent PNG. It uses specific verbs and resources, and effectively distinguishes from siblings like aetherwave_remove_background_video and aetherwave_edit_image.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides context for when to use (product shots, character cutouts, etc.) and mentions the fallback behavior and best-use case (photographic subjects). It lacks explicit 'when-not-to-use' guidance but implies limitations with transparent-PNG inputs.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

aetherwave_remove_background_videoRemove background from videoAInspect

Strips the background from a video frame-by-frame using rembg (u2netp) on AetherWave's Python service. Pass a public videoUrl. Choose bgType: "transparent" for an alpha-channel WebM output (compositing) or bgType: "color" with a customColor hex for a solid replacement. 2 credits per second. Slowest tool in the surface (per-frame processing); a 6s clip takes ~4 min, a 30s clip ~15-20 min. Works best on subjects with clear edges (people, products). Returns the processed video URL (R2-hosted).

ParametersJSON Schema
NameRequiredDescriptionDefault
bgTypeNo'transparent' = alpha WebM output (default). 'color' = solid replacement using customColor.
videoUrlYesPublic URL of the source video (MP4).
customColorNoHex color for solid background when bgType='color' (e.g. '#00ff00'). Default green.
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Discloses key behavioral aspects: frame-by-frame processing, model used (rembg u2netp), output types (alpha WebM or solid color), credit cost, speed estimates, and output format (R2 URL). No annotation contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is compact, every sentence adds distinct information, and it is front-loaded with the core function. No unnecessary words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity and lack of output schema, the description sufficiently covers input, behavior, performance, and output format, making it complete for agent use.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, but the description adds value by explaining the behavior of bgType options (alpha-channel WebM, solid replacement) and providing an example hex for customColor, which aids understanding beyond parameter names.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('Strips the background from a video frame-by-frame') and resource, differentiating from siblings like aetherwave_remove_background (likely for images) and other video tools.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides clear context on performance ('Slowest tool', time estimates) and suitability ('Works best on subjects with clear edges'), helping decide when to use. However, it doesn't explicitly mention alternatives or when not to use it.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

aetherwave_upscale_imageUpscale image (Topaz)AInspect

Upscales a source image using Topaz's high-fidelity upscaler. Pass a public imageUrl and an upscaleFactor. Credit cost depends on the source resolution × factor; small images cost less than large ones at the same factor. Returns the upscaled image URL.

ParametersJSON Schema
NameRequiredDescriptionDefault
imageUrlYesPublic URL of the source image.
upscaleFactorNoUpscale multiplier. Defaults to '2x'. '8x' is heavy; use only on small sources.
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Adds value beyond annotations by detailing credit cost dependency and noting that 8x is heavy. No contradictions with annotations; readOnlyHint=false appropriately indicates mutation.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three concise sentences: purpose, parameters with cost hint, return value. Front-loaded with key information, no redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Complete for a simple upscale tool: covers input, cost behavior, and output. No output schema needed. Context signals confirm low complexity.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema covers both parameters with descriptions. Description adds important context like cost calculation and caution for 8x factor, enhancing meaning beyond schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it upscales a source image using Topaz's high-fidelity upscaler, distinguishing it from sibling tools like aetherwave_upscale_video and other image operations.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides explicit parameters (imageUrl, upscaleFactor) and cost guidance based on resolution and factor. Also advises against using 8x on large sources. Could be improved by stating when not to use the tool.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

aetherwave_upscale_videoUpscale video (Atlas Video Upscaler)AInspect

Upscales a source video to 1080p or 2K using Atlas. Pass a public videoUrl and the target resolution. Cost is per-second (7 cr/s @ 1080p, 9 cr/s @ 2K). Atlas-side limits: clips up to 53s at 1080p, 23s at 2K, source must be <=30fps. Returns the upscaled video URL (R2-hosted).

ParametersJSON Schema
NameRequiredDescriptionDefault
videoUrlYesPublic URL of the source video (MP4).
targetResolutionNoTarget output resolution. Defaults to '1080p'. '2k' is more expensive and limited to ~23s clips.
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description fully discloses behavioral traits: it is a mutating operation (cost per second, limits), returns a URL, and has no idempotency guarantees. Annotations (readOnlyHint=false, destructiveHint=false) are consistent with the description, which adds crucial details like cost and platform limits that annotations do not cover.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is highly concise: two sentences plus a few additional facts, all essential. It front-loads the main purpose and then packs in constraints, costs, and return value without redundancy. Every sentence earns its place.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (2 parameters, no output schema), the description is complete: it covers input requirements, target resolution options with costs and limits, and the return format (R2-hosted URL). No gaps for an AI agent to interpret.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, but the description adds significant meaning beyond the schema: it explains that targetResolution defaults to '1080p', that '2k' is more expensive and limited to shorter clips, and that videoUrl must be a public MP4 URL. This helps the agent understand parameter choices and trade-offs.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: to upscale a source video to 1080p or 2K using Atlas. It specifies the action (upscale), resource (video), and target resolutions, distinguishing it from sibling tools like aetherwave_upscale_image (image upscaling) and other video manipulation tools.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear usage context: when you need to upscale a video, with constraints on source FPS (≤30fps), clip length (53s at 1080p, 23s at 2K), and public URL requirement. It implicitly guides when not to use (e.g., if source exceeds limits). However, it does not explicitly compare to other upscaling methods within the same family, but the context is sufficient for an AI agent.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Discussions

No comments yet. Be the first to start the discussion!

Try in Browser

Your Connectors

Sign in to create a connector for this server.

Resources