Generate Speech (TTS) with Replicate
replicate_generate_speechGenerate natural-sounding speech from text. Choose from multiple models and customize voice, speed, and style.
Instructions
Convert text to natural-sounding speech.
DISPLAY REQUIREMENT — after this tool returns successfully, include the URL printed in the tool's text content as a markdown link [Speech](URL) in your reply so the user can play it. URLs expire in ~24h.
Args:
text (string, 1-5000): Text to synthesize.
model (string, default "kokoro"): Curated key (kokoro, minimax-speech, chatterbox, gemini-tts, grok-tts) or "owner/name[:version]".
voice (string, optional): Voice ID. For Kokoro: af_bella, af_sarah, am_adam, am_michael, bf_emma, bf_isabella, etc. (a-f = American female, b-f = British female, a-m = American male, b-m = British male).
speed (0.5-2.0, optional): Speech rate.
extra_input (object, optional): Model-specific extras (e.g. {audio_prompt: ""} for voice cloning with Chatterbox).
download (boolean, default true).
timeout_ms: Default 300000.
Returns: PredictionResult. local_paths contain WAV/MP3 files.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| text | Yes | Text to synthesize. | |
| model | No | Either a curated key (kokoro, minimax-speech, chatterbox, gemini-tts, grok-tts) or a Replicate identifier. | kokoro |
| speed | No | Speech speed multiplier (0.5-2.0). | |
| voice | No | Voice identifier. Kokoro examples: af_bella, am_adam, bf_emma. Check model docs for full list. | |
| download | No | Whether to download the generated files locally. Default true. When false, only Replicate URLs are returned (URLs expire after ~24h). | |
| timeout_ms | No | Max ms to wait for the prediction. If exceeded, returns the prediction ID so you can poll via replicate_get_prediction. Default: 300000 (5min). | |
| extra_input | No | Additional model-specific inputs. |