scroll
Scroll a window or page to bring out-of-view elements into view, capture full-page images, or extract text via OCR with deduplication.
Instructions
Purpose: Scroll a window or page. 5 strategies via action: 'raw' (wheel notches), 'to_element' (UIA name/automationId or CSS selector), 'smart' (auto-detect target with multi-strategy fallback), 'capture' (full-page stitched image), 'read' (scroll+OCR+dedupe → stitched text). Details: action='raw': send raw mouse-wheel notches at (x,y) or current cursor, optional window focus. action='to_element': scroll a named element into viewport (UIA or CDP). action='smart': handles nested scroll layers, virtualised lists, sticky-header occlusion. action='capture': stitches full-page images (caps at ~700KB raw); sizeReduced=true means downscaled. action='read': scrolls page-by-page, OCRs each viewport, deduplicates overlapping lines, returns stitched text; language auto-detected from OS locale if omitted. Prefer: Use action='to_element' or action='smart' for click target out-of-viewport recovery (entity_outside_viewport). Use action='capture' for reading long pages as images. Use action='read' for extracting text from long native-app documents (PDF readers, text editors, terminals) where copy-paste is unavailable. For simple scroll without target, use action='raw'. Caveats: action='capture' returns stitched image — pixels do NOT match screen coords when sizeReduced=true, use for reading only, not mouse_click. action='smart' CDP path requires browser_open. action='to_element' native path requires element to implement UIA ScrollItemPattern. action='read' uses OCR (imperfect accuracy) and requires the window to be visible; for browser pages prefer browser_eval (e.g. evaluate document.body.innerText) or browser_overview to extract DOM text accurately. Examples: scroll({action:'raw', direction:'down', amount:5, windowTitle:'Chrome'}) scroll({action:'to_element', name:'OK', windowTitle:'Dialog'}) scroll({action:'smart', target:'#create-release-btn'}) scroll({action:'capture', windowTitle:'Chrome', maxScrolls:10}) scroll({action:'read', windowTitle:'Acrobat', maxPages:15}) // OCR + dedupe long PDF
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
No arguments | |||