Skip to main content
Glama

scroll

Scroll a window or page to bring out-of-view elements into view, capture full-page images, or extract text via OCR with deduplication.

Instructions

Purpose: Scroll a window or page. 5 strategies via action: 'raw' (wheel notches), 'to_element' (UIA name/automationId or CSS selector), 'smart' (auto-detect target with multi-strategy fallback), 'capture' (full-page stitched image), 'read' (scroll+OCR+dedupe → stitched text). Details: action='raw': send raw mouse-wheel notches at (x,y) or current cursor, optional window focus. action='to_element': scroll a named element into viewport (UIA or CDP). action='smart': handles nested scroll layers, virtualised lists, sticky-header occlusion. action='capture': stitches full-page images (caps at ~700KB raw); sizeReduced=true means downscaled. action='read': scrolls page-by-page, OCRs each viewport, deduplicates overlapping lines, returns stitched text; language auto-detected from OS locale if omitted. Prefer: Use action='to_element' or action='smart' for click target out-of-viewport recovery (entity_outside_viewport). Use action='capture' for reading long pages as images. Use action='read' for extracting text from long native-app documents (PDF readers, text editors, terminals) where copy-paste is unavailable. For simple scroll without target, use action='raw'. Caveats: action='capture' returns stitched image — pixels do NOT match screen coords when sizeReduced=true, use for reading only, not mouse_click. action='smart' CDP path requires browser_open. action='to_element' native path requires element to implement UIA ScrollItemPattern. action='read' uses OCR (imperfect accuracy) and requires the window to be visible; for browser pages prefer browser_eval (e.g. evaluate document.body.innerText) or browser_overview to extract DOM text accurately. Examples: scroll({action:'raw', direction:'down', amount:5, windowTitle:'Chrome'}) scroll({action:'to_element', name:'OK', windowTitle:'Dialog'}) scroll({action:'smart', target:'#create-release-btn'}) scroll({action:'capture', windowTitle:'Chrome', maxScrolls:10}) scroll({action:'read', windowTitle:'Acrobat', maxPages:15}) // OCR + dedupe long PDF

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault

No arguments

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description carries full burden. It details behaviors: raw sends wheel notches at coordinates, to_element requires UIA ScrollItemPattern, smart handles nested scroll layers, capture stitches images (with caveat on pixel coords when sizeReduced=true), read uses OCR with imperfect accuracy. It also notes preconditions like window visibility and browser_open requirement.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is relatively long but well-structured into Purpose, Details, Prefer, Caveats, and Examples. It is front-loaded with the core purpose. Each sentence adds value given the complexity of 5 actions. Could be slightly more concise, but the structure aids readability.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the high complexity (5 actions, many parameters) and 100% schema coverage, the description is largely complete. It covers behaviors, usage, and caveats. Missing explicit return types for some actions (raw, to_element, smart) beyond mentioning image and text for capture/read. The 'include' parameter hints at envelope shape. Overall, a minor gap.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so baseline is 3. The description adds significant value by explaining parameters in context (e.g., 'action=\"raw\": send raw mouse-wheel notches at (x,y)', 'action=\"smart\": handles nested scroll layers, virtualised lists'). It also clarifies abstract concepts like 'homing' and 'sizeReduced=true' meaning downscaled. However, some parameter details are already in the schema, so slight redundancy.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'Scroll a window or page' and enumerates 5 distinct strategies (raw, to_element, smart, capture, read). It differentiates from siblings like browser_click or mouse_click by focusing solely on scrolling, and provides specific use cases for each strategy.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The 'Prefer' section explicitly tells when to use each action (e.g., 'Use action=\"to_element\" or action=\"smart\" for click target out-of-viewport recovery'). Caveats provide clear when-not-to-use (e.g., 'action=\"capture\" ... use for reading only, not mouse_click'). Alternatives like browser_eval and browser_overview are mentioned for browser text extraction.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/Harusame64/desktop-touch-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server