desktop-touch-mcp

Overview Schema Related Servers Score Discussions

scroll_capture

Capture entire webpages or long documents by scrolling and stitching multiple screenshots into a single image for full-length overviews.

Instructions

Purpose: Scroll a window top-to-bottom (or left-to-right) and stitch all frames into one image — for full-length webpages or documents that exceed a single screenshot. Details: Output is capped at ~700KB raw (MCP base64 encoding inflates to ~933KB, approaching the 1MB message limit); when sizeReduced=true appears in the response, iterative WebP downscale was applied (up to 3 passes at 0.75× each) — reduce maxScrolls or add grayscale=true to avoid truncation. Focuses the target window, scrolls to Ctrl+Home, then captures frames via Page Down until identical consecutive frames are detected or maxScrolls is reached. Pixel-overlap detection eliminates seam duplication; check response overlapMode — 'mixed-with-failures' means some seams may have duplicate rows. Prefer: Use only when the goal is whole-page overview of content too long for one screenshot. For partial verification or locating a specific section, prefer scroll + screenshot(detail='text') — you get actionable[] with coords and pay only per-viewport token cost. scroll_capture returns a stitched image (not clickable elements) that stays expensive in tokens regardless of the 1MB guard. Caveats: When sizeReduced=true, stitched image pixels do NOT match screen coords — use for reading only, not for mouse_click. When overlapMode='mixed-with-failures', expect occasional duplicate content rows near frame boundaries. Increase scrollDelayMs for pages with animations or lazy-loaded images.

Input Schema

TableJSON Schema

Name	Required	Description	Default
`windowTitle`	Yes	Partial title of the window to capture (case-insensitive match)
`direction`	No	Scroll direction: 'down' (vertical, uses Page Down key) or 'right' (horizontal, uses mouse scroll). Default 'down'.	down
`maxScrolls`	No	Maximum scroll iterations before stopping (default 10, max 30)
`scrollDelayMs`	No	Milliseconds to wait after each scroll for rendering to settle (default 400). Increase for slow/animated pages.
`maxWidth`	No	Max size of the short edge of the final image (default 1280). For 'down': caps the image width; height is unconstrained. For 'right': caps the image height; width is unconstrained.

Tool Definition Quality

A4.6/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure and excels. It details output size caps (~700KB raw), iterative downscaling with 'sizeReduced=true', focus behavior (Ctrl+Home), scrolling mechanics (Page Down), stopping conditions (identical frames or maxScrolls), pixel-overlap detection, and caveats like non-matching screen coords and duplicate content rows. This provides rich context beyond basic functionality.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured with clear sections (Purpose, Details, Prefer, Caveats) and front-loaded key information. While detailed, every sentence adds value—no fluff. It could be slightly more concise but efficiently covers complex behavior and guidelines.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (multi-step scrolling, image stitching, size management) and lack of annotations or output schema, the description provides comprehensive context. It explains output characteristics (image format, size limits, downscaling), behavioral nuances (overlap detection, focus actions), and practical considerations (token costs, when to avoid). This is complete enough for effective agent use.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already documents all parameters thoroughly. The description adds minimal param-specific semantics beyond the schema, such as mentioning 'maxScrolls' and 'scrollDelayMs' in behavioral context, but doesn't provide new syntax or format details. This meets the baseline for high schema coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description explicitly states the purpose with specific verbs and resources: 'Scroll a window top-to-bottom (or left-to-right) and stitch all frames into one image — for full-length webpages or documents that exceed a single screenshot.' It clearly distinguishes this from sibling tools like 'screenshot' by emphasizing whole-page overview versus partial verification.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit guidance on when to use this tool ('Use only when the goal is whole-page overview of content too long for one screenshot') and when to prefer alternatives ('For partial verification or locating a specific section, prefer scroll + screenshot(detail='text')'). It names the alternative tool and explains the trade-offs in token cost and functionality.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

Tool Definition Quality Score (TDQS)
By punkpeye on April 3, 2026.
mcp
The Hackers Who Tracked My Sleep Cycle
By punkpeye on March 26, 2026.
security
Open Source Has a Bot Problem
By punkpeye on March 19, 2026.
open source

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/Harusame64/desktop-touch-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server