synthesize_with_clone
Clone a voice from a reference 48 kHz mono WAV to synthesize new speech with the same speaker identity, prosody, and style.
Instructions
Synthesize speech cloning a voice from a reference WAV. The reference WAV sets the speaker identity, prosody, and style. Both reference and output are 48 kHz mono WAV.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| text | Yes | Text to synthesize. | |
| reference_wav_path | Yes | Absolute path to reference WAV (the voice to clone). Must be 48 kHz mono. | |
| reference_text | Yes | Transcript of the reference WAV (used for alignment). | |
| output_filename | No | Output WAV filename. | cloned.wav |
| steps | No | Diffusion inference steps. |