Lipsync / talking avatar with Replicate
replicate_lipsyncAnimate a portrait to speak using text-to-speech or a driving audio file. Produces an MP4 video of the lipsynced avatar.
Instructions
Animate a portrait image to speak — either from a text script (model does TTS + lipsync) or from a driving audio file. Produces an MP4 video.
DISPLAY REQUIREMENT — after this tool returns successfully, include the URL(s) so the user can open the video. URLs expire in ~24h.
Args:
image_url (URL): Portrait or face image to animate. Use replicate_upload_file for local files.
text (string, optional): Script for the avatar to speak. Used by video-avatar (maps to voice_script). At least one of text or audio_url is required.
audio_url (URL, optional): Driving audio for lipsync. Required for sadtalker; optional override for video-avatar. At least one of text or audio_url is required.
model (string, default "video-avatar"): Curated key (video-avatar, sadtalker) or "owner/name[:version]".
extra_input (object, optional): Model-specific extras (e.g. {voice_prompt: "speak slowly"} for video-avatar).
download (boolean, default true): Download the MP4 locally.
timeout_ms: Default 300000.
Returns: PredictionResult. local_paths contain .mp4 files.
Examples:
image_url="<portrait.jpg>", text="Hello! Welcome to our product demo." → video-avatar (TTS + lipsync)
image_url="<face.jpg>", audio_url="<speech.wav>", model="sadtalker" → audio-driven lipsync
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| text | No | Text script for the avatar to speak. Required for models that do TTS+lipsync (video-avatar). Ignored when audio_url is provided. | |
| model | No | Lipsync model. Curated: video-avatar, sadtalker. Or "owner/name". | video-avatar |
| download | No | ||
| audio_url | No | URL of the driving audio. Required for audio-only lipsync models (sadtalker). Optional override when model can do TTS. | |
| image_url | Yes | URL of the portrait or face image to animate. Use replicate_upload_file for local files. | |
| timeout_ms | No | Max ms to wait for the prediction. If exceeded, returns the prediction ID so you can poll via replicate_get_prediction. Default: 300000 (5min). | |
| extra_input | No | Additional model-specific inputs. |