transcribe_video
Transcribe any YouTube or video URL to get a timestamped transcript. Supports multiple Whisper model sizes for accuracy or speed.
Instructions
Download and transcribe a YouTube video (or any video URL) with timestamps.
Downloads the audio, transcribes it locally using Whisper, and returns a full timestamped transcript. The LLM can then answer questions about the video content and point to specific timestamps.
Results are cached to disk so repeat requests for the same video are instant.
Supported model sizes: tiny, base, small, medium, large
tiny: fastest, good for most videos (~75MB, default)
base: better accuracy, slower (~150MB)
small: high accuracy, much slower (~500MB)
medium/large: best accuracy, very slow (~1.5GB/~3GB)
Models are downloaded automatically on first use.
Sample prompts that trigger this tool: - "Transcribe this video: https://youtube.com/watch?v=..." - "What is discussed in this video? https://youtube.com/watch?v=..." - "Summarize this YouTube video: https://..." - "At what timestamp do they talk about X in https://..." - "Explain the concept from 5:30 in this video: https://..."
Args: url: YouTube URL or any video URL supported by yt-dlp. model_size: Whisper model size (tiny/base/small/medium/large). Default: tiny. language: Language code (e.g. "en", "de", "fr"). Auto-detected if empty.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| url | Yes | ||
| model_size | No | tiny | |
| language | No |
Output Schema
| Name | Required | Description | Default |
|---|---|---|---|
| result | Yes |