get_video_frame
Capture a still frame from any YouTube video at a specified timestamp, returning an image for analysis of on-screen content like slides, captions, or user interfaces.
Instructions
Capture a single still frame (screenshot) from a YouTube video at a moment and return it as an image, so a multimodal model can answer "what's on screen here?".
Use this to see the video at a specific time -- e.g. read a slide, a caption burned into the video, or a UI being demoed. Pair it with get_most_replayed or get_transcript(include_timestamps =True) to pick an interesting moment, then grab the frame there.
Requires ffmpeg on the server. The captured frame is the nearest keyframe at or just before the requested moment (it can be off by a second or two) and is downscaled to max_width to keep the response small.
Args: video: A YouTube URL (watch, youtu.be, shorts, embed, live) or an 11-character video ID. at: The moment to capture -- seconds (e.g. 90) or a "mm:ss" / "h:mm:ss" string. max_width: Max width in pixels of the returned image (clamped 64..1280; default 640). Smaller is cheaper on a vision model's image-token budget.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| video | Yes | ||
| at | Yes | ||
| max_width | No |