analyze_video
Analyze video content by extracting key frames and answering questions about actions, objects, and events using a vision-language model.
Instructions
Analyze a video using Qwen3-VL-8B vision-language model on Blaxel.
The video must be accessible via a public URL. The model will:
1. Download the video
2. Extract key frames (up to max_frames)
3. Analyze the frames with your question
Examples:
- "What happens in this video?"
- "Summarize the main events"
- "What products are shown?"
- "Describe the people and their actions"
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| video_url | Yes | URL of the video to analyze (must be publicly accessible) | |
| question | No | Question or prompt about the video | Describe what happens in this video in detail. |
| max_frames | No | Maximum number of frames to extract (1-16) | |
| max_tokens | No | Maximum tokens in response |