Skip to main content
Glama

scroll_capture

Capture entire webpages or long documents by scrolling and stitching multiple screenshots into a single image for full-length overviews.

Instructions

Purpose: Scroll a window top-to-bottom (or left-to-right) and stitch all frames into one image — for full-length webpages or documents that exceed a single screenshot. Details: Output is capped at ~700KB raw (MCP base64 encoding inflates to ~933KB, approaching the 1MB message limit); when sizeReduced=true appears in the response, iterative WebP downscale was applied (up to 3 passes at 0.75× each) — reduce maxScrolls or add grayscale=true to avoid truncation. Focuses the target window, scrolls to Ctrl+Home, then captures frames via Page Down until identical consecutive frames are detected or maxScrolls is reached. Pixel-overlap detection eliminates seam duplication; check response overlapMode — 'mixed-with-failures' means some seams may have duplicate rows. Prefer: Use only when the goal is whole-page overview of content too long for one screenshot. For partial verification or locating a specific section, prefer scroll + screenshot(detail='text') — you get actionable[] with coords and pay only per-viewport token cost. scroll_capture returns a stitched image (not clickable elements) that stays expensive in tokens regardless of the 1MB guard. Caveats: When sizeReduced=true, stitched image pixels do NOT match screen coords — use for reading only, not for mouse_click. When overlapMode='mixed-with-failures', expect occasional duplicate content rows near frame boundaries. Increase scrollDelayMs for pages with animations or lazy-loaded images.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
windowTitleYesPartial title of the window to capture (case-insensitive match)
directionNoScroll direction: 'down' (vertical, uses Page Down key) or 'right' (horizontal, uses mouse scroll). Default 'down'.down
maxScrollsNoMaximum scroll iterations before stopping (default 10, max 30)
scrollDelayMsNoMilliseconds to wait after each scroll for rendering to settle (default 400). Increase for slow/animated pages.
maxWidthNoMax size of the short edge of the final image (default 1280). For 'down': caps the image width; height is unconstrained. For 'right': caps the image height; width is unconstrained.
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure and excels. It details output size caps (~700KB raw), iterative downscaling with 'sizeReduced=true', focus behavior (Ctrl+Home), scrolling mechanics (Page Down), stopping conditions (identical frames or maxScrolls), pixel-overlap detection, and caveats like non-matching screen coords and duplicate content rows. This provides rich context beyond basic functionality.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured with clear sections (Purpose, Details, Prefer, Caveats) and front-loaded key information. While detailed, every sentence adds value—no fluff. It could be slightly more concise but efficiently covers complex behavior and guidelines.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (multi-step scrolling, image stitching, size management) and lack of annotations or output schema, the description provides comprehensive context. It explains output characteristics (image format, size limits, downscaling), behavioral nuances (overlap detection, focus actions), and practical considerations (token costs, when to avoid). This is complete enough for effective agent use.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already documents all parameters thoroughly. The description adds minimal param-specific semantics beyond the schema, such as mentioning 'maxScrolls' and 'scrollDelayMs' in behavioral context, but doesn't provide new syntax or format details. This meets the baseline for high schema coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description explicitly states the purpose with specific verbs and resources: 'Scroll a window top-to-bottom (or left-to-right) and stitch all frames into one image — for full-length webpages or documents that exceed a single screenshot.' It clearly distinguishes this from sibling tools like 'screenshot' by emphasizing whole-page overview versus partial verification.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit guidance on when to use this tool ('Use only when the goal is whole-page overview of content too long for one screenshot') and when to prefer alternatives ('For partial verification or locating a specific section, prefer scroll + screenshot(detail='text')'). It names the alternative tool and explains the trade-offs in token cost and functionality.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/Harusame64/desktop-touch-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server