synthesize
Synthesize natural-sounding speech from English text using a diffusion TTS model. Returns the path to a 48 kHz WAV file and its duration.
Instructions
Synthesize speech from text using VoxCPM2 (2B diffusion TTS, 48 kHz). Returns the path to the output WAV file and its duration.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| text | Yes | Text to synthesize (English). | |
| output_filename | No | Output WAV filename (e.g. 'scene_01.wav'). Saved to VOXCPM_OUTPUT_DIR. | output.wav |
| steps | No | Diffusion inference steps (10–50). Higher = better quality, slower. |