Skip to main content
Glama

extract_screenshots

Extract key screenshots from YouTube videos using AI to identify visually significant moments. Get base64 images or save to disk for analysis.

Instructions

Extract key screenshots from a YouTube video at important moments. Uses AI to identify visually significant timestamps, then extracts frames. Returns both base64 images and optionally saves to disk.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
youtube_urlYesFull YouTube URL (youtube.com/watch?v=ID, youtu.be/ID, or youtube.com/shorts/ID)
countNoNumber of screenshots to extract (1-20, default: 5)
output_dirNoOptional directory to save screenshots. If not provided, uses SCREENSHOT_OUTPUT_DIR env var or temp directory.
focusNoOptional focus for timestamp selection (e.g., 'product demos', 'code examples', 'diagrams'). Default analyzes for general key moments.
resolutionNoOutput resolution: thumbnail (160p), small (360p), medium (720p), large (1080p), full (original). Default: largelarge
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It adequately describes the core behavior (AI-driven timestamp selection, frame extraction, base64 return, optional disk saving) but lacks details about error handling, rate limits, authentication requirements, processing time, or what constitutes 'important moments.' The description doesn't contradict any annotations since none exist, but could provide more operational context.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is perfectly concise and well-structured in two sentences. The first sentence establishes the core functionality, the second explains the dual output mechanism. Every word earns its place with no redundancy or unnecessary elaboration, making it easy to parse while being informationally dense.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's moderate complexity (5 parameters, AI processing, dual output) and absence of both annotations and output schema, the description is adequate but incomplete. It covers the what and how but lacks information about return format details (structure of base64 response), error conditions, performance characteristics, or dependencies. For a tool with no output schema, more detail about return values would be helpful.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The description adds meaningful context about parameter usage beyond the 100% schema coverage. It explains that screenshots are extracted 'at important moments' (relating to the 'focus' parameter's purpose), mentions AI-driven selection (context for 'count' and 'focus'), and notes the dual output (base64 and optional disk saving) which helps understand 'output_dir' usage. While the schema fully documents parameters, the description provides valuable semantic framing.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose with specific verbs ('extract key screenshots', 'uses AI to identify visually significant timestamps', 'extracts frames') and resources ('from a YouTube video'). It distinguishes from sibling tools like 'extract_frames' by specifying AI-driven selection of important moments rather than manual frame extraction, and from 'get_video_timestamps' by including screenshot extraction functionality.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context about when to use this tool (extracting AI-selected key screenshots from YouTube videos) and implicitly distinguishes from alternatives like 'extract_frames' (manual extraction) and 'get_video_timestamps' (timestamp-only output). However, it doesn't explicitly state when NOT to use this tool or provide direct comparison statements like 'use X instead for Y scenario'.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/Legorobotdude/yt-analysis-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server