Clone a voice with Replicate
replicate_clone_voiceSynthesize speech using a cloned voice from a short reference audio sample. Provide a URL to the voice sample and the text to speak.
Instructions
Synthesize speech in a cloned voice. Provide a short reference audio sample (~5-30 s) and the text to speak; the model reproduces the voice characteristics.
DISPLAY REQUIREMENT — after this tool returns successfully, include the URL printed in the tool's text content as a markdown link [Audio](URL) so the user can play it. URLs expire in ~24h.
Args:
text (string, 1-5000): Text to synthesize in the cloned voice.
reference_audio_url (URL): URL of the voice sample to clone from. Use replicate_upload_file to upload a local file first.
language (string, optional): ISO-639 code (e.g. "en", "es", "it"). Default "en".
model (string, default "xtts-v2"): Curated key (xtts-v2, openvoice-v2) or "owner/name[:version]".
extra_input (object, optional): Model-specific extras.
download (boolean, default true).
timeout_ms: Default 300000.
Returns: PredictionResult. local_paths contain WAV/MP3 files.
Examples:
text="Hello world, this is my cloned voice.", reference_audio_url="<url-to-your-voice-sample.wav>"
text="Buongiorno a tutti!", reference_audio_url="", language="it"
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| text | Yes | Text to synthesize in the cloned voice. | |
| model | No | Voice cloning model. Curated: xtts-v2, openvoice-v2. Or "owner/name". | xtts-v2 |
| download | No | ||
| language | No | ISO-639 language code (e.g. 'en', 'es', 'it'). Default: 'en'. | |
| timeout_ms | No | Max ms to wait for the prediction. If exceeded, returns the prediction ID so you can poll via replicate_get_prediction. Default: 300000 (5min). | |
| extra_input | No | Additional model-specific inputs. | |
| reference_audio_url | Yes | URL of a short voice sample (~5-30s) to clone. Use replicate_upload_file if you only have a local file. |