Skip to main content
Glama

scroll

Scroll a window or page using raw wheel, element targeting, smart auto-detection, full-page image capture, or OCR text extraction.

Instructions

Purpose: Scroll a window or page. 5 strategies via action: 'raw' (wheel notches), 'to_element' (UIA name/automationId or CSS selector), 'smart' (auto-detect target with multi-strategy fallback), 'capture' (full-page stitched image), 'read' (scroll+OCR+dedupe → stitched text). Details: action='raw': send raw mouse-wheel notches at (x,y) or current cursor, optional window focus. Scroll scale — UIA Tier 1 (ScrollPattern apps): empirically ≈1 text line per notch; amount:3 (default) ≈ 3 lines (small nudge), amount:10 ≈ 10 lines (~½ visible area). Legacy SendInput: each amount unit = 3 wheel ticks; ≈9 text lines per unit at Windows default (app/OS-setting dependent). action='to_element': scroll a named element into viewport (UIA or CDP). action='smart': handles nested scroll layers, virtualised lists, sticky-header occlusion. action='capture': stitches full-page images (caps at ~700KB raw); sizeReduced=true means downscaled. action='read': scrolls page-by-page, OCRs each viewport, deduplicates overlapping lines, returns stitched text; language auto-detected from OS locale if omitted. Prefer: Use action='to_element' or action='smart' for click target out-of-viewport recovery (entity_outside_viewport). Use action='capture' for reading long pages as images. Use action='read' for extracting text from long native-app documents (PDF readers, text editors, terminals) where copy-paste is unavailable. For simple scroll without target, use action='raw'. Caveats: action='capture' returns stitched image — pixels do NOT match screen coords when sizeReduced=true, use for reading only, not mouse_click. action='smart' CDP path requires browser_open. action='to_element' native path requires element to implement UIA ScrollItemPattern. action='read' uses OCR (imperfect accuracy) and requires the window to be visible; for browser pages prefer browser_eval or browser_overview for accurate DOM text. action='raw' typed errors: code:'ScrollNotDelivered' on silent drop (overlay / non-scrollable / UIPI low-IL); already-at-boundary is success via pre/post-percent disambiguation. hints.verifyDelivery.{channel,reason} per ADR-018 §2.6 (Phase 1b: Tier 1 UIA dispatch for HWNDs exposing ScrollPattern; other apps use legacy SendInput). action='smart' typed errors: code:'OverflowHiddenAncestor' (retry with expandHidden:true), code:'VirtualScrollExhausted' (provide virtualIndex). Examples: scroll({action:'raw', direction:'down', amount:5, windowTitle:'Chrome'}) scroll({action:'to_element', name:'OK', windowTitle:'Dialog'}) scroll({action:'smart', target:'#create-release-btn'}) scroll({action:'capture', windowTitle:'Chrome', maxScrolls:10}) scroll({action:'read', windowTitle:'Acrobat', maxPages:15}) // OCR + dedupe long PDF

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
actionYesAction selector — one of: raw, to_element, smart, capture, read. Per-action required fields are enforced at call time (see the tool description); this flat schema lists every action's fields as optional.
directionNoScroll direction
amountNoNumber of scroll notches (default 3). UIA-capable apps (Notepad, Explorer, WPF — Tier 1): empirically ≈1 text line per notch; amount:3 (default) ≈ 3 lines (small nudge), amount:10 ≈ 10 lines (~½ visible area). Legacy apps (SendInput path): each amount unit sends 3 wheel ticks; at Windows default 3 lines/tick that is ≈9 text lines per unit — distance varies by app/OS wheel-speed settings.
xNoX coordinate to scroll at (moves cursor there first)
yNoY coordinate to scroll at
speedNoCursor movement speed in px/sec (0=teleport, omit=default)
homingNoApply window-movement homing correction to (x,y) before scrolling. Default true.
windowTitleNoPartial window title. When provided, the server focuses this window first.
hwndNoDirect window handle ID (takes precedence over windowTitle).
includeNoOptional response-shape opt-in. `['envelope']` returns the self-documenting envelope (`_version` / `data` / `as_of` / `confidence`). `['raw']` forces raw shape (overrides DESKTOP_TOUCH_ENVELOPE=1 server default). Default behaviour is raw shape (compat with existing clients).
nameNoPartial name/label of the element (UIA name match). Use for native app elements. At least one of name or selector must be provided.
selectorNoCSS selector for the element (Chrome/Edge only). At least one of name or selector must be provided.
blockNoVertical alignment after scroll — start/center/end/nearest (Chrome path only, default: center)center
tabIdNoTab ID (Chrome path only). Omit for first page tab.
portNoCDP port for Chrome path (default 9222)
targetNoCSS selector (Chrome/Edge) or partial UIA name (native apps). For CDP path, must be a valid CSS selector (starts with #, ., tag, or [ ). For UIA path, a partial name match against element Name property.
strategyNoauto (default): try CDP → UIA → image in order. cdp: Chrome/Edge only. uia: native Windows UIA. image: image + Win32 binary-search.auto
inlineNoVertical alignment after scroll (CDP path). Default: center.center
maxDepthNoMax number of ancestor scroll containers to walk. Default 3.
retryCountNoMax scroll attempts (image path binary-search). Default 3, cap 4.
verifyWithHashNoVerify scroll effectiveness via perceptual hash comparison. Automatically enabled for image path.
virtualIndexNoTarget row index in a virtualised list (0-based). Enables direct TanStack/data-index seeking.
virtualTotalNoTotal row count in a virtualised list. Required when virtualIndex is set.
expandHiddenNoTemporarily set overflow:hidden ancestors to overflow:auto to unlock scroll. Mutates live CSS.
hintNoScroll direction hint for binary-search (image path). Seeds lo/hi bounds to reduce attempts.
maxScrollsNoMaximum scroll iterations before stopping (default 10, max 30)
scrollDelayMsNoMilliseconds to wait after each scroll for rendering to settle (default 400). Increase for slow/animated pages.
maxWidthNoMax size of the short edge of the final image (default 1280). For 'down': caps the image width; height is unconstrained. For 'right': caps the image height; width is unconstrained.
maxPagesNoMaximum number of scroll steps / OCR pages (default 20, max 50).
scrollKeyNoKey sent to scroll one page. PageDown (default): full-page scroll for most apps. Space: web/PDF readers. ArrowDown: line-by-line slow scroll.PageDown
stopWhenNoChangeNoStop automatically when two consecutive pages yield no new lines after deduplication (page-end detection). Default true.
languageNoOCR language code (e.g. 'ja', 'en', 'zh'). Omit to auto-detect from Windows system locale via Intl.DateTimeFormat().resolvedOptions().locale. Default: auto.
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description fully discloses behavioral traits: scroll scale differences between UIA and SendInput, typed errors (e.g., ScrollNotDelivered, OverflowHiddenAncestor), and limitations (e.g., capture not for clicking, read OCR imperfect). It also explains the verifyWithHash and stopWhenNoChange mechanisms.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is long but well-structured with clear headings (Purpose, Details, Prefer, Caveats, Examples). It front-loads the purpose and organizes details logically. Some redundancy exists (e.g., repeating default values), but overall it is as concise as feasible given the complexity.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given 32 parameters, no output schema, and no annotations, the description is thorough: it covers all actions, error conditions, and alternatives like browser_eval. It lacks explicit return value documentation, but for a complex tool, it provides nearly everything an agent needs.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so baseline is 3. The description adds significant meaning: it explains per-action required fields, provides examples, and clarifies parameter behavior (e.g., amount scaling, scrollKey usage). It goes beyond schema by contextualizing parameters within each action.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Scroll a window or page.' It enumerates five distinct strategies (raw, to_element, smart, capture, read) with specific use cases, distinguishing it from sibling tools like mouse_click or browser_eval.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicit usage guidance is provided: 'Prefer: Use action='to_element' or action='smart' for click target out-of-viewport recovery... Use action='capture' for reading long pages as images... Use action='read' for extracting text... For simple scroll without target, use action='raw'.' It also lists caveats for when not to use certain actions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/Harusame64/desktop-touch-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server