generate_audio
Generate audio from a text description using ACE or Stable Audio models. Returns a prompt ID immediately; audio asset ID arrives upon completion.
Instructions
Generate audio from a text prompt — supports ACE Step 1.5 and Stable Audio 3 model families. Builds the appropriate workflow graph, filling unspecified parameters from your configured defaults (set_defaults / COMFYUI_DEFAULT_* / config file), auto-selecting local models when needed. Returns the prompt_id immediately; the resulting audio asset_id arrives in the completion notification. Requires a running ComfyUI with the corresponding model files installed.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| model_family | Yes | Audio model family — determines which workflow template and model loaders to use | |
| prompt | Yes | Text description of the audio to generate (genre, mood, instruments, etc.) | |
| duration | Yes | Audio duration in seconds | |
| seed | No | Seed (omit to randomize) | |
| steps | No | Sampling steps | |
| cfg | No | CFG scale | |
| sampler | No | Sampler name (e.g. euler, lcm, dpmpp_2m) | |
| scheduler | No | Scheduler (e.g. normal, simple, karras) | |
| filename_prefix | No | Output filename prefix (default: audio/ace_step or audio/stable_audio_3) | |
| unet | No | ACE UNet model filename (in models/diffusion_models/); auto-selected if omitted | |
| vae | No | ACE VAE model filename (in models/vae/); auto-selected if omitted | |
| clip_a | No | Primary text encoder filename (in models/text_encoders/); auto-selected if omitted | |
| clip_b | No | Secondary text encoder filename (in models/text_encoders/); auto-selected if omitted | |
| lyrics | No | Lyrics or song structure description (ACE only — section-by-section breakdown) | |
| language | No | Language code for prompt (ACE only, default: 'en') | |
| musical_key | No | Target musical key (ACE only, e.g. 'C major', 'E minor'; default: 'C major') | |
| shift | No | ModelSamplingAuraFlow shift parameter (ACE only, default: 3) | |
| guidance_scale | No | Text encoder guidance scale (ACE only, default: 0.85) | |
| checkpoint | No | Stable Audio 3 checkpoint filename (in models/checkpoints/); auto-selected if omitted | |
| clip | No | Stable Audio CLIP encoder filename (in models/text_encoders/); auto-selected if omitted | |
| negative_prompt | No | Negative prompt (Stable Audio 3 only; default: empty) |