searchVisualContent
Search visual content in videos to locate specific frames, objects, or text using AI vision and OCR. Returns timestamped image evidence with automatic indexing for unprocessed videos.
Instructions
Search the actual visual content of a video or your indexed frame library. Uses Apple Vision OCR, optional Gemini frame descriptions, and optional Gemini semantic embeddings. Always returns frame/image evidence with timestamps. [~1-3s if indexed, ~60-120s if auto-indexing]
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| query | Yes | Visual search query, e.g. 'whiteboard diagram' or 'slide that says title research checklist' | |
| videoIdOrUrl | No | Optional video scope. If provided, the server can auto-index this video if needed. | |
| maxResults | No | ||
| minScore | No | ||
| autoIndexIfNeeded | No | If scoped to a video and no visual index exists yet, build it automatically (default true) | |
| intervalSec | No | Frame interval to use if auto-indexing is triggered | |
| maxFrames | No | Frame cap to use if auto-indexing is triggered | |
| imageFormat | No | ||
| width | No | ||
| autoDownload | No | ||
| downloadFormat | No | ||
| includeGeminiDescriptions | No | ||
| includeGeminiEmbeddings | No | ||
| dryRun | No |