Generate spoken audio from text: narration, a voiceover, a read-aloud script, or a multi-voice dialogue. Pass text (up to 2048 chars) — the words to be spoken. To speak in one of YOUR saved voices, pass voice with the voice NAME (or id): users speak plain language and never know ids, so resolve the name yourself (the voice tool, action "list", shows every saved voice) and never ask the user for an id. Reference voices, trained clones and preset voices are all routed correctly by kind. To match a voice instantly from a clip instead, pass reference_audio_url (a short clip) or up to 3 reference_audio_urls and address them as @Audio1, @Audio2, @Audio3 in the text for dialogue. Alternatively pass image_url to voice a scene from a picture (cannot combine with reference audio). Optional speech_rate (-50..100), pitch (-12..12), loudness (-50..100). Returns a playable audio_url, duration_seconds, and generation_id (also saved to your library).