indexVisualContent
Build a searchable visual index for videos by extracting frames and analyzing content with OCR and computer vision to extract text and enable semantic frame retrieval.
Instructions
Build a real visual index for a video using extracted frames, Apple Vision OCR, Apple Vision feature prints, and optional Gemini frame descriptions. Returns frame evidence with local image paths. [~30-120s, downloads + OCR + vision]
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| videoIdOrUrl | Yes | Video ID or URL to index visually | |
| intervalSec | No | Frame sampling interval in seconds (default 20) | |
| maxFrames | No | Maximum frames to analyze (default 12) | |
| imageFormat | No | ||
| width | No | ||
| autoDownload | No | Automatically download a small local video copy if none exists (default true) | |
| downloadFormat | No | Video format used if auto-download is needed (default worst_video) | |
| forceReindex | No | Re-run OCR/description analysis even if frames are already indexed | |
| includeGeminiDescriptions | No | Use Gemini to describe each frame when a Gemini key is configured | |
| includeGeminiEmbeddings | No | Generate Gemini embeddings over OCR/description text for semantic retrieval (default true when Gemini key is available) | |
| dryRun | No |