scroll
Scroll a window or page using raw wheel, element targeting, smart auto-detection, full-page image capture, or OCR text extraction.
Instructions
Purpose: Scroll a window or page. 5 strategies via action: 'raw' (wheel notches), 'to_element' (UIA name/automationId or CSS selector), 'smart' (auto-detect target with multi-strategy fallback), 'capture' (full-page stitched image), 'read' (scroll+OCR+dedupe → stitched text). Details: action='raw': send raw mouse-wheel notches at (x,y) or current cursor, optional window focus. Scroll scale — UIA Tier 1 (ScrollPattern apps): empirically ≈1 text line per notch; amount:3 (default) ≈ 3 lines (small nudge), amount:10 ≈ 10 lines (~½ visible area). Legacy SendInput: each amount unit = 3 wheel ticks; ≈9 text lines per unit at Windows default (app/OS-setting dependent). action='to_element': scroll a named element into viewport (UIA or CDP). action='smart': handles nested scroll layers, virtualised lists, sticky-header occlusion. action='capture': stitches full-page images (caps at ~700KB raw); sizeReduced=true means downscaled. action='read': scrolls page-by-page, OCRs each viewport, deduplicates overlapping lines, returns stitched text; language auto-detected from OS locale if omitted. Prefer: Use action='to_element' or action='smart' for click target out-of-viewport recovery (entity_outside_viewport). Use action='capture' for reading long pages as images. Use action='read' for extracting text from long native-app documents (PDF readers, text editors, terminals) where copy-paste is unavailable. For simple scroll without target, use action='raw'. Caveats: action='capture' returns stitched image — pixels do NOT match screen coords when sizeReduced=true, use for reading only, not mouse_click. action='smart' CDP path requires browser_open. action='to_element' native path requires element to implement UIA ScrollItemPattern. action='read' uses OCR (imperfect accuracy) and requires the window to be visible; for browser pages prefer browser_eval or browser_overview for accurate DOM text. action='raw' typed errors: code:'ScrollNotDelivered' on silent drop (overlay / non-scrollable / UIPI low-IL); already-at-boundary is success via pre/post-percent disambiguation. hints.verifyDelivery.{channel,reason} per ADR-018 §2.6 (Phase 1b: Tier 1 UIA dispatch for HWNDs exposing ScrollPattern; other apps use legacy SendInput). action='smart' typed errors: code:'OverflowHiddenAncestor' (retry with expandHidden:true), code:'VirtualScrollExhausted' (provide virtualIndex). Examples: scroll({action:'raw', direction:'down', amount:5, windowTitle:'Chrome'}) scroll({action:'to_element', name:'OK', windowTitle:'Dialog'}) scroll({action:'smart', target:'#create-release-btn'}) scroll({action:'capture', windowTitle:'Chrome', maxScrolls:10}) scroll({action:'read', windowTitle:'Acrobat', maxPages:15}) // OCR + dedupe long PDF
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| action | Yes | Action selector — one of: raw, to_element, smart, capture, read. Per-action required fields are enforced at call time (see the tool description); this flat schema lists every action's fields as optional. | |
| direction | No | Scroll direction | |
| amount | No | Number of scroll notches (default 3). UIA-capable apps (Notepad, Explorer, WPF — Tier 1): empirically ≈1 text line per notch; amount:3 (default) ≈ 3 lines (small nudge), amount:10 ≈ 10 lines (~½ visible area). Legacy apps (SendInput path): each amount unit sends 3 wheel ticks; at Windows default 3 lines/tick that is ≈9 text lines per unit — distance varies by app/OS wheel-speed settings. | |
| x | No | X coordinate to scroll at (moves cursor there first) | |
| y | No | Y coordinate to scroll at | |
| speed | No | Cursor movement speed in px/sec (0=teleport, omit=default) | |
| homing | No | Apply window-movement homing correction to (x,y) before scrolling. Default true. | |
| windowTitle | No | Partial window title. When provided, the server focuses this window first. | |
| hwnd | No | Direct window handle ID (takes precedence over windowTitle). | |
| include | No | Optional response-shape opt-in. `['envelope']` returns the self-documenting envelope (`_version` / `data` / `as_of` / `confidence`). `['raw']` forces raw shape (overrides DESKTOP_TOUCH_ENVELOPE=1 server default). Default behaviour is raw shape (compat with existing clients). | |
| name | No | Partial name/label of the element (UIA name match). Use for native app elements. At least one of name or selector must be provided. | |
| selector | No | CSS selector for the element (Chrome/Edge only). At least one of name or selector must be provided. | |
| block | No | Vertical alignment after scroll — start/center/end/nearest (Chrome path only, default: center) | center |
| tabId | No | Tab ID (Chrome path only). Omit for first page tab. | |
| port | No | CDP port for Chrome path (default 9222) | |
| target | No | CSS selector (Chrome/Edge) or partial UIA name (native apps). For CDP path, must be a valid CSS selector (starts with #, ., tag, or [ ). For UIA path, a partial name match against element Name property. | |
| strategy | No | auto (default): try CDP → UIA → image in order. cdp: Chrome/Edge only. uia: native Windows UIA. image: image + Win32 binary-search. | auto |
| inline | No | Vertical alignment after scroll (CDP path). Default: center. | center |
| maxDepth | No | Max number of ancestor scroll containers to walk. Default 3. | |
| retryCount | No | Max scroll attempts (image path binary-search). Default 3, cap 4. | |
| verifyWithHash | No | Verify scroll effectiveness via perceptual hash comparison. Automatically enabled for image path. | |
| virtualIndex | No | Target row index in a virtualised list (0-based). Enables direct TanStack/data-index seeking. | |
| virtualTotal | No | Total row count in a virtualised list. Required when virtualIndex is set. | |
| expandHidden | No | Temporarily set overflow:hidden ancestors to overflow:auto to unlock scroll. Mutates live CSS. | |
| hint | No | Scroll direction hint for binary-search (image path). Seeds lo/hi bounds to reduce attempts. | |
| maxScrolls | No | Maximum scroll iterations before stopping (default 10, max 30) | |
| scrollDelayMs | No | Milliseconds to wait after each scroll for rendering to settle (default 400). Increase for slow/animated pages. | |
| maxWidth | No | Max size of the short edge of the final image (default 1280). For 'down': caps the image width; height is unconstrained. For 'right': caps the image height; width is unconstrained. | |
| maxPages | No | Maximum number of scroll steps / OCR pages (default 20, max 50). | |
| scrollKey | No | Key sent to scroll one page. PageDown (default): full-page scroll for most apps. Space: web/PDF readers. ArrowDown: line-by-line slow scroll. | PageDown |
| stopWhenNoChange | No | Stop automatically when two consecutive pages yield no new lines after deduplication (page-end detection). Default true. | |
| language | No | OCR language code (e.g. 'ja', 'en', 'zh'). Omit to auto-detect from Windows system locale via Intl.DateTimeFormat().resolvedOptions().locale. Default: auto. |