Transcribe Audio / Video with Whisper
replicate_transcribe_audioTranscribe audio or video files to text using Whisper models with automatic language detection and optional English translation.
Instructions
Transcribe an audio or video file to text using Whisper-family models on Replicate.
Args:
audio (URL): URL of the audio (or video) to transcribe.
model (default "incredibly-fast-whisper"): Curated key (whisper, incredibly-fast-whisper, whisperx, scribe) or "owner/name".
language (string, optional): ISO-639 hint (e.g. "en", "it"). Default: auto-detect.
translate_to_english (bool, optional): Translate the transcript to English instead of preserving source language.
extra_input (object, optional): Model-specific extras (e.g. {batch_size: 24} for incredibly-fast-whisper).
Returns: PredictionResult with text_output containing the transcript.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| audio | Yes | URL of the audio (or video) file to transcribe. | |
| model | No | Speech-to-text model. Curated: whisper, incredibly-fast-whisper, whisperx, scribe. Or "owner/name". | incredibly-fast-whisper |
| download | No | Output is text — default false. | |
| language | No | ISO-639 language hint (e.g. 'en', 'it'). Default: auto-detect. | |
| timeout_ms | No | Max ms to wait for the prediction. If exceeded, returns the prediction ID so you can poll via replicate_get_prediction. Default: 300000 (5min). | |
| extra_input | No | ||
| translate_to_english | No | If true, translate the transcript to English. |