generate_subtitles
Generate subtitle files for audio or video files using local speech recognition. Supports automatic language detection and English translation.
Instructions
Generate subtitle files for an audio or video file using whisper.cpp. Set language='auto' to detect the spoken language automatically. Set translate_to_english=true to also generate an English translation subtitle file. When both are requested, two .srt files are saved: one in the original language (e.g. film.ja.srt) and one English translation (film.en.srt). Load in VLC via Subtitle → Add Subtitle File. Supports all standard formats plus .3gp and .ts.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| file_path | Yes | Absolute Windows path to the file. | |
| language | No | Language code (e.g. ja, es, fr, de) or 'auto' to detect automatically. Defaults to en. | en |
| translate_to_english | No | Also generate an English translation .srt alongside the native language .srt. Only applies when language is not 'en'. Not available in background mode. | |
| background | No | Run as a detached background job — recommended for files over 10 minutes. Returns a job ID to use with check_progress. translate_to_english is not available in background mode. | |
| threads | No | CPU threads. Defaults to 2 of 2. | |
| temperature | No | Sampling temperature 0.0–1.0. Default 0.0. | |
| prompt | No | Prior context string for domain-specific vocabulary or speaker names. | |
| beam_size | No | Beam search width. Higher = more accurate, slower. Default 5. | |
| best_of | No | Candidate sequences evaluated. Default 5. | |
| diarize | No | Stereo speaker diarization. Requires stereo audio with speakers on separate channels. | |
| vad_model | No | Path to Silero VAD model .bin. Strips silence before transcription. Download via download_model. |