generate_audio
Generate audio from text prompts with text-to-speech or sound effects using AI providers like OpenAI, Google, and ElevenLabs.
Instructions
Generate audio from text using AI. Supports text-to-speech and sound effects. Providers: openai, google, elevenlabs. ElevenLabs: use providerOptions.mode = "sound-effect" for sound effects. Available: none configured
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| text | Yes | Text to convert to speech, or a description of the sound effect to generate | |
| provider | No | Provider to use: openai, google, elevenlabs. Auto-selects if omitted. | |
| voice | No | Voice name (provider-specific). OpenAI: alloy, ash, coral, echo, fable, nova, onyx, sage, shimmer. Google: Kore, Charon, Fenrir, Aoede, Puck, etc. ElevenLabs: voice ID. | |
| speed | No | Speech speed multiplier (OpenAI only): 0.25 to 4.0 | |
| format | No | Output format (OpenAI only): mp3, opus, aac, flac, wav, pcm | |
| outputDirectory | No | Directory to save the generated file. Supports absolute or relative paths (resolved from cwd). Defaults to MEDIA_OUTPUT_DIR env var or cwd. | |
| providerOptions | No | Provider-specific parameters passed through directly |