desktop-touch-mcp
Server Configuration
Describes the environment variables required to run the server.
| Name | Required | Description | Default |
|---|---|---|---|
| DESKTOP_TOUCH_DOCK_PIN | No | Always-on-top toggle: 'true' or 'false' | true |
| DESKTOP_TOUCH_ALLOWLIST | No | Path to an allowlist file for workspace_launch; overrides default locations | |
| DESKTOP_TOUCH_DOCK_TITLE | No | Title of the window to dock on startup; use '@parent' to auto-dock the terminal hosting Claude via process-tree walker | |
| DESKTOP_TOUCH_DOCK_WIDTH | No | Width in pixels (e.g., '480') or as a percentage of work area (e.g., '25%') | 480 |
| DESKTOP_TOUCH_DOCK_CORNER | No | Corner to dock the window: 'top-left', 'top-right', 'bottom-left', or 'bottom-right' | bottom-right |
| DESKTOP_TOUCH_DOCK_HEIGHT | No | Height in pixels (e.g., '360') or as a percentage of work area (e.g., '25%') | 360 |
| DESKTOP_TOUCH_DOCK_MARGIN | No | Screen-edge padding in pixels | 8 |
| DESKTOP_TOUCH_MOUSE_SPEED | No | Default mouse movement speed in pixels per second; set to '0' for instant teleport | 1500 |
| DESKTOP_TOUCH_DOCK_MONITOR | No | Monitor ID from get_screen_info; default is primary monitor | primary |
| DESKTOP_TOUCH_DOCK_SCALE_DPI | No | If 'true', multiply pixel values by dpi/96 for per-monitor scaling | false |
| DESKTOP_TOUCH_DOCK_TIMEOUT_MS | No | Maximum wait time in milliseconds for the target window to appear | 5000 |
Capabilities
Features and capabilities supported by this server
| Capability | Details |
|---|---|
| tools | {
"listChanged": false
} |
Tools
Functions exposed to the LLM to take actions
| Name | Description |
|---|---|
| browser_clickA | Find a DOM element by CSS selector and click it (combines browser_locate + mouse_click in one step). Prefer over mouse_click for Chrome — selector-based clicking is stable across repaints. Pass tabId+port so the server auto-guards (verifies tab readyState and identity) and returns post.perception.status. lensId is optional for advanced pinned-tab workflows. Caveats: Fails if the element is outside the visible viewport — scroll it into view with browser_eval("document.querySelector('sel').scrollIntoView()") first. |
| browser_evalA | Purpose: Inspect or operate on a browser tab via 3 actions: 'js' (evaluate JS), 'dom' (get HTML), 'appState' (extract SSR-injected SPA state). Details: action='js' — Run a JS expression. withPerception:true wraps in {ok, result, post}. action='dom' — Return outerHTML of selector (or document.body), truncated to maxLength. action='appState' — Scan Next/Nuxt/Remix/Apollo/GitHub/Redux SSR injected JSON; pass selectors to override defaults. Prefer: Use action='appState' BEFORE 'dom' or 'js' on SPAs where rendered HTML is sparse — single CDP call. Use 'dom' when 'appState' is empty and you need page structure. Use 'js' as the escape hatch for arbitrary scripting. Caveats: DOM nodes cannot be returned from action='js' directly (circular refs are serialized safely). React/Vue/Svelte controlled inputs cannot be set via element.value — use keyboard(action='type') / browser_fill instead. readyState is strictly checked; guard blocks if page is still loading. Examples: browser_eval({action:'js', expression:'document.title'}) → page title browser_eval({action:'dom', selector:'#main', maxLength:5000}) → outerHTML browser_eval({action:'appState'}) → default SPA state probes |
| browser_fillA | Fill a form input with a value via CDP — works on React/Vue/Svelte controlled inputs that reject browser_eval value assignment. Use browser_overview or browser_locate first to obtain a stable selector. Use this over browser_eval when setting a controlled input's value via JS does not update the framework state. Caveats: Requires browser_open (CDP active). Does not work on contenteditable rich-text editors — use keyboard(action='type') for those. actual in response shows what the element's value property reads after fill; verify it matches the intended value. |
| browser_formA | Inspect all form fields (input, select, textarea, button) within a CSS-selector-specified container and return their name, type, id, current value, hint text, disabled/readOnly state, and associated label text (resolved via for[id], ancestor LABEL, aria-labelledby, aria-label in that order). Use this before browser_fill to discover exact field selectors and avoid accidentally targeting the wrong input (e.g. a global search bar). Caveats: Requires browser_open (CDP active). Hidden inputs (type=hidden) are excluded by default — set includeHidden:true if needed. Value text is truncated at 200 chars. |
| browser_locateA | Find a DOM element by CSS selector and return its physical screen coordinates — compatible directly with mouse_click. Prefer browser_click to find+click in one step. Prefer browser_overview to discover selectors. Caveats: Coordinates are captured at call time; if the page reflows before mouse_click, coords may be stale. |
| browser_navigateA | Navigate a browser tab to a URL via CDP Page.navigate — more reliable than clicking the address bar. Pass tabId+port so the server auto-guards (verifies tab readyState) and returns post.perception.status. lensId is optional for advanced pinned-tab workflows. Caveats: Does not block until page load completes — follow with wait_until(element_matches) or repeated browser_eval polling for slow pages. |
| browser_openA | Connect to Chrome/Edge running with --remote-debugging-port and return open tab IDs — required before all other browser_* tools. Pass launch:{} (or with overrides) to auto-spawn a debug-mode browser when no CDP endpoint is live (idempotent: an already-running endpoint is preferred). Returns tabs[] with id, url, title, active — pass tabId to browser_* tools to target a specific tab. Caveats: CDP connection is per-process; if Chrome restarts, call browser_open again to get fresh tab IDs. A Chrome session started without --remote-debugging-port cannot be taken over — close it first or use a separate userDataDir. |
| browser_overviewA | List all interactive elements (links, buttons, inputs, ARIA controls) on the current page with CSS selectors, visible text or value for inputs, and viewport status — use before browser_click to discover stable selectors, and prefer this over screenshot when verifying button/toggle state after submission (no image tokens, structured output). scope limits to a CSS subsection (e.g. '.sidebar'). Returns state (checked/pressed/selected/expanded) for ARIA custom controls. Caveats: Selectors are CDP-generated snapshots — re-call after page navigates or re-renders. Input text reflects the empty-field hint text when defined (takes priority over typed value) — use browser_eval('document.querySelector(sel).value') to read actual typed content. |
| browser_searchA | Grep-like element search across the current page. by: 'text' (literal substring), 'regex', 'role', 'ariaLabel', 'selector' (CSS). Returns results[] sorted by confidence descending — pass results[0].selector to browser_click. Pagination via offset/maxResults. Caveats: Use browser_overview for broad discovery; use browser_search when you know specific text or role to target. |
| click_elementA | Invoke a UI element by name or automationId via UIA InvokePattern — no screen coordinates needed. The server auto-guards using windowTitle (verifies identity, foreground, modal) and returns post.perception.status. Prefer over mouse_click for buttons, menu items, and links in native Windows apps. Use desktop_discover first to discover automationIds. Pass fixId from a suggestedFix to re-target after window identity drift. lensId is optional for advanced pinned-lens use. Caveats: Requires InvokePattern — some custom controls do not expose it; fall back to mouse_click in that case. |
| clipboardA | Read or write the Windows clipboard. action='read' returns current text content (empty string if non-text). action='write' replaces clipboard with given text. Caveats: Non-text clipboard payloads (images, files) return empty string on read. Overwrites existing clipboard content on write. |
| desktop_stateA | Purpose: Read-only observation of the current desktop state. Returns focused window/element, modal flag, attention signal from Auto Perception. Phase 4 absorbs former get_active_window / get_cursor_position / get_screen_info / get_document_state via include* flags. Details: Always returns: focusedWindow (title, hwnd, processName), focusedElement (name, type, value, automationId), cursorPos {x,y}, cursorOverElement (name, type), cursorOverWindow, hasModal (boolean), pageState ('ready'|'loading'|'dialog'), attention, visibleWindows count. Optional fields (default off): includeCursor:true → cursor {x,y,monitorId} (richer than cursorPos). includeScreen:true → screen {virtualScreen, displays[], displayCount, primaryIndex}. includeDocument:true → document {url, title, readyState, selection, scroll, viewport} via CDP (silently omitted on non-Chromium foreground). Chromium: cursorOverElement is null (UIA sparse); focusedElement may fall back to CDP document.activeElement; hints.focusedElementSource reports which path produced the row ('view' = engine-perception latest_focus, 'uia' = direct UIA query, 'cdp' = document.activeElement). Does NOT enumerate descendants — use desktop_discover for actionable entity list and window list. Prefer: Use after each action to confirm state. Cheapest observation tool — cheaper than any screenshot. attention='ok' means safe to proceed; other values require recovery (see suggest[]). Set include* flags only when you need the extra data (each adds one syscall or CDP round-trip). Caveats: Cannot detect non-UIA elements (custom-drawn UIs, game overlays). hasModal only detects modal dialogs exposed via UIA — browser alert/confirm dialogs may not appear here. includeDocument requires browser_open (CDP active); silently omitted otherwise with hints.documentUnavailable. |
| focus_windowA | Bring a window to the foreground by partial title match (case-insensitive). Use when a tool does not accept a windowTitle param, or when you need to switch focus before a sequence of actions. Use chromeTabUrlContains to activate a specific Chrome/Edge tab by URL substring before focusing — only the active tab's title appears in the windows list. If CDP is unavailable, chromeTabUrlContains is silently skipped — check response.hints.warnings. Returns WindowNotFound if no match exists; call desktop_discover to see available titles. Caveats: On some apps focus may be immediately stolen back (modal dialogs, UAC prompts) — verify with desktop_state after focusing. |
| keyboardA | Purpose: Send keyboard input to a window: 'type' for text, 'press' for key combos. Details: action='type' inserts text (auto-clipboard for non-ASCII / IME-safe). action='press' sends key combos like 'ctrl+c'/'alt+tab'. Pass windowTitle to auto-focus and auto-guard (verifies identity, foreground, modal) before input. Omitting windowTitle acts on the active window (unguarded). Prefer: Use windowTitle to auto-focus before injection. Set lensId to enable perception guards. Use desktop_act({action:'setValue'}) for form fields backed by UIA ValuePattern. Caveats: win+r/win+x/win+s/win+l blocked for security. action='type' does not handle IME composition for CJK — use use_clipboard=true or desktop_act({action:'setValue'}) instead. Non-ASCII punctuation (em-dash etc.) auto-routes via clipboard to prevent Chrome address-bar hijack; pass forceKeystrokes:true to disable. Background mode (PostMessage/WM_CHAR) auto-engages for known terminal windows (Windows Terminal / cmd / PowerShell) so keystrokes survive user-side foreground changes; DTM_BG_AUTO=1 enables it globally. Foreground-path keystrokes for non-terminal apps run with a per-chunk foreground guard (Phase B) — when the user grabs focus mid-stream, the call aborts with FocusLostDuringType and returns context.typed/context.remaining so the caller can re-focus and resume; pass abortOnFocusLoss:false to disable. Examples: keyboard({action:'type', text:'hello', windowTitle:'Notepad'}) → text injected (guarded) keyboard({action:'type', text:'hello'}) → text injected (unguarded) keyboard({action:'press', keys:'ctrl+c'}) → copy keyboard({action:'press', keys:'escape', windowTitle:'Dialog'}) → dismiss dialog |
| mouse_clickA | Click at screen coordinates. Normally pass windowTitle so the server auto-guards the click (verifies target identity, foreground, coordinate is inside the target rect) and returns post.perception without a confirmation screenshot. origin+scale from dotByDot=true screenshots are converted to screen coords before guarding. doubleClick:true for double-click; tripleClick:true for triple-click (selects a full line of text). Prefer click_element (UIA) for native apps, prefer browser_click for Chrome. Examples: mouse_click({windowTitle:'Notepad', x:200, y:150}) // guarded — post.perception.status='ok'. mouse_click({x:100, y:100}) // unguarded — post.perception.status='unguarded'. If a guard failure returns a suggestedFix, pass its fixId to approve the fix: mouse_click({fixId:'fix-...'}) // one-shot, expires in 15s. lensId is optional and only for advanced pinned-target workflows; omit it for normal use. Caveats: origin+scale are meaningful ONLY with dotByDot=true screenshot responses. |
| mouse_dragA | Click and drag from (startX, startY) to (endX, endY) holding the left mouse button — for sliders, drag-and-drop, canvas drawing, and window resizing. Pass windowTitle so the server auto-guards the start coordinate and returns post.perception. Examples: mouse_drag({windowTitle:'Notepad', startX:50, startY:50, endX:200, endY:200}). lensId is optional and only for advanced pinned-target workflows. Caveats: Left button only. Both start and endpoint are guarded. Cross-window and desktop drags are blocked by default — pass allowCrossWindowDrag:true to confirm intent. |
| notification_showA | Show a Windows system tray balloon notification to alert the user. Use at the end of a long-running task so the user knows it finished without watching the screen. Caveats: Focus Assist (Do Not Disturb) mode suppresses balloon tips; the tool still returns ok:true in that case. Uses System.Windows.Forms — no external modules needed. |
| run_macroA | Purpose: Execute multiple tools sequentially in one MCP call — eliminates round-trip latency for predictable multi-step workflows. Details: steps[] is an array of {tool, params} objects. Accepts all desktop-touch tools plus a special sleep pseudo-step: {tool:"sleep", params:{ms:N}} (max 10000ms per step). stop_on_error=true (default) halts on first failure. Max 50 steps. The LLM cannot inspect intermediate results during execution — all steps run to completion (or first error) before any output is returned. Prefer: Use for predictable fixed sequences (focus → sleep → type → screenshot). Do not use for conditional logic — return to the LLM between branches so it can inspect intermediate state. Caveats: If any step may fail conditionally (e.g. a dialog that may or may not appear), split the macro at that point. Each screenshot step within a macro incurs the same token cost as a standalone call. Examples: [{tool:'focus_window',params:{windowTitle:'Notepad'}},{tool:'sleep',params:{ms:300}},{tool:'keyboard',params:{action:'type',text:'Hello'}},{tool:'screenshot',params:{detail:'text',windowTitle:'Notepad'}}] [{tool:'browser_navigate',params:{url:'https://example.com'}},{tool:'wait_until',params:{condition:'element_matches',target:{by:'text',pattern:'Example Domain'}}}] |
| screenshotA | Purpose: Capture desktop, window, or region across detail levels (meta / text / image / som / ocr) and capture modes (normal / background). Details: detail='meta' (default) returns window titles+positions only (~20 tok/window, no image). detail='text' returns UIA actionable elements with clickAt coords, no image (~100-300 tok). detail='som' returns a Set-of-Marks annotated image plus OCR-detected elements with IDs (bypasses UIA entirely). detail='ocr' returns Windows OCR words with screen-pixel clickAt coords (Phase 4: absorbs former screenshot_ocr — use when UIA is sparse and you want to force OCR unconditionally). detail='image' and detail='som' are server-blocked unless confirmImage=true is also passed. mode='background' captures hidden/minimised/occluded windows via PrintWindow (Phase 4: absorbs former screenshot_background) — pair with windowTitle/hwnd. dotByDot=true returns 1:1 pixel WebP; compute screen coords: screen_x = origin_x + image_x (or screen_x = origin_x + image_x / scale when dotByDotMaxDimension is set — scale printed in response). diffMode=true returns only changed windows after the first call (~160 tok). region={x,y,width,height} captures a sub-rectangle (Phase 4: absorbs former scope_element when paired with windowTitle/hwnd — discover element bounds via desktop_discover, then pass region here). Data reduction: grayscale=true (−50%), dotByDotMaxDimension=1280 (caps longest edge), windowTitle+region (sub-crop to exclude browser chrome — e.g. region={x:0, y:120, width:1920, height:900}). Prefer: Use meta to orient, text before clicking, dotByDot only when precise pixel coords are needed. Use detail='som' for native apps or games that do not expose UIA elements (UIA-Blind). Use detail='ocr' for OCR-only (skip UIA entirely). Use mode='background' when the target window must stay hidden or cannot be brought to foreground. Prefer browser_* tools for Chrome. Use diffMode after actions to confirm state changed. Only use image+confirmImage when text returned 0 actionable elements and visual inspection is genuinely required. Caveats: Default mode scales to maxDimension=768 — image pixels ≠ screen pixels; apply the scale formula before passing to mouse_click. Foreground detail='image' is always blocked without confirmImage=true. diffMode requires a prior full-capture baseline (non-diff call or workspace_snapshot) — calling diffMode cold returns a full frame, not a diff. mode='background' requires windowTitle or hwnd, and only composes with detail in {'image','meta'} — detail='text'/'som'/'ocr' run only against foreground capture (the dispatcher rejects the conflicting combination). Passing mode='background' is itself the acknowledgement that image pixels are wanted, so confirmImage is NOT required for it (matches the former screenshot_background contract). fullContent=false enables legacy mode (faster but GPU windows may be black). detail='ocr' requires windowTitle or hwnd; first call may take ~1s (WinRT cold-start) and the matching OCR language pack must be installed. Examples: screenshot() → meta orientation of all windows screenshot({detail:'text', windowTitle:'Notepad'}) → clickable elements with coords screenshot({detail:'ocr', windowTitle:'PDF', ocrLanguage:'ja'}) → OCR words with screen-pixel coords screenshot({mode:'background', windowTitle:'Chrome', dotByDot:true, dotByDotMaxDimension:1280, grayscale:true}) → background-capture pixel-accurate Chrome screenshot({windowTitle:'Notepad', region:{x:0,y:120,width:600,height:400}}) → cropped sub-region (zoom into element after desktop_discover) |
| scrollA | Purpose: Scroll a window or page. 5 strategies via action: 'raw' (wheel notches), 'to_element' (UIA name/automationId or CSS selector), 'smart' (auto-detect target with multi-strategy fallback), 'capture' (full-page stitched image), 'read' (scroll+OCR+dedupe → stitched text). Details: action='raw': send raw mouse-wheel notches at (x,y) or current cursor, optional window focus. action='to_element': scroll a named element into viewport (UIA or CDP). action='smart': handles nested scroll layers, virtualised lists, sticky-header occlusion. action='capture': stitches full-page images (caps at ~700KB raw); sizeReduced=true means downscaled. action='read': scrolls page-by-page, OCRs each viewport, deduplicates overlapping lines, returns stitched text; language auto-detected from OS locale if omitted. Prefer: Use action='to_element' or action='smart' for click target out-of-viewport recovery (entity_outside_viewport). Use action='capture' for reading long pages as images. Use action='read' for extracting text from long native-app documents (PDF readers, text editors, terminals) where copy-paste is unavailable. For simple scroll without target, use action='raw'. Caveats: action='capture' returns stitched image — pixels do NOT match screen coords when sizeReduced=true, use for reading only, not mouse_click. action='smart' CDP path requires browser_open. action='to_element' native path requires element to implement UIA ScrollItemPattern. action='read' uses OCR (imperfect accuracy) and requires the window to be visible; for browser pages prefer browser_eval (e.g. evaluate document.body.innerText) or browser_overview to extract DOM text accurately. Examples: scroll({action:'raw', direction:'down', amount:5, windowTitle:'Chrome'}) scroll({action:'to_element', name:'OK', windowTitle:'Dialog'}) scroll({action:'smart', target:'#create-release-btn'}) scroll({action:'capture', windowTitle:'Chrome', maxScrolls:10}) scroll({action:'read', windowTitle:'Acrobat', maxPages:15}) // OCR + dedupe long PDF |
| server_statusA | Return MCP server status: version / native engine availability / Auto Perception state / v2 activation. uia: 'native' = Rust UIA addon (fast, ~2 ms focus / ~100 ms tree); 'powershell' = PS fallback (~366 ms focus). imageDiff: 'native' = Rust SSE2 SIMD (0.26 ms @ 1080p); 'typescript' = TS fallback (~3.8 ms). Diagnostic metadata — do not surface these values to the user unless they ask about performance or troubleshooting. Call once per session if you need to know which path is active; the result is stable for the lifetime of the server process. |
| terminalA | Purpose: Interact with a terminal window: read output, send input, or run+wait+read in one call. Details: action='run' is the recommended high-level workflow: send command → wait until quiet/pattern/timeout → read output. Returns completion={reason, elapsedMs} first-class. action='read' reads current text via UIA TextPattern (falls back to OCR); use sinceMarker for incremental diff. action='send' sends a command with focus management. Prefer: action='run' for command execution + result. Use action='read'/'send' for fine-grained control or when you need to interleave other actions. Caveats: Do not screenshot the terminal — terminal(action='read') is cheaper and structured. action='run' supports completion reasons: quiet | pattern_matched | timeout | window_closed | window_not_found. preferClipboard=true (send default) overwrites user clipboard. Examples: terminal({action:'run', windowTitle:'PowerShell', input:'npm test', until:{mode:'pattern', pattern:'npm test:'}}) → {output, completion:{reason:'pattern_matched'}} terminal({action:'run', windowTitle:'pwsh', input:'ls'}) → quiet 800ms wait, returns output terminal({action:'read', windowTitle:'PowerShell', sinceMarker:'...'}) → incremental diff terminal({action:'send', windowTitle:'PowerShell', input:'echo hello'}) → sends text + Enter |
| wait_untilA | Purpose: Server-side poll for an observable condition — eliminates screenshot-polling loops when waiting for state changes. Details: condition selects what to watch: window_appears/window_disappears (target.windowTitle required), focus_changes (optional target.fromHwnd), element_appears/value_changes (target.windowTitle + target.elementName required, UIA; min 500ms interval), ready_state (target.windowTitle; visible + not minimized), terminal_output_contains (target.windowTitle + target.pattern required [+target.regex:true], needs terminal tools loaded), element_matches (target.by + target.pattern required, needs browser tools loaded), url_matches (target.pattern required [+target.regex:true]; matches the active tab's location.href via CDP — use for SPA route changes, redirects, OAuth flows). Returns {ok:true, elapsedMs, observed} on success, or WaitTimeout error with suggest hints. timeoutMs default 5000 (max 60000). Prefer: Use instead of run_macro({sleep:N}) + screenshot loops. Use terminal_output_contains to detect CLI command completion. Use element_matches for browser DOM readiness after navigation. Use url_matches when the URL is the most reliable signal (SPA routing / redirect cascades). Caveats: terminal_output_contains, element_matches, and url_matches require a browser CDP connection (open --remote-debugging-port=9222 first). element_appears/value_changes spawn a UIA process per poll — interval clamped to 500ms minimum. On WaitTimeout, read the suggest[] array in the error for recovery steps. Examples: wait_until({condition:'window_appears', target:{windowTitle:'Save As'}, timeoutMs:10000}) wait_until({condition:'terminal_output_contains', target:{windowTitle:'Terminal', pattern:'$ '}, timeoutMs:30000}) wait_until({condition:'element_matches', target:{by:'text', pattern:'Submit', scope:'#checkout-form'}}) wait_until({condition:'url_matches', target:{pattern:'/dashboard'}, timeoutMs:15000}) wait_until({condition:'url_matches', target:{pattern:'^https://app\.example\.com/orders/[0-9]+$', regex:true}}) |
| window_dockA | Purpose: Decorate a window: pin (always-on-top), unpin, or dock (move + resize + optional pin). Details: action='pin' makes window always-on-top until unpin/duration_ms. action='unpin' removes always-on-top. action='dock' positions to corner with width/height (default 480×360 bottom-right) and optionally pins. Minimized windows are automatically restored before docking. Prefer: Use action='dock' for terminal/CLI window auto-positioning at session start. Use action='pin' alone when you only need always-on-top without moving or resizing. Caveats: Pin survives minimize/restore; explicit action='unpin' needed to release. Dock fails on elevated processes. Dock overrides any existing Win+Arrow snap arrangement. Examples: window_dock({action:'dock', title:'PowerShell', corner:'bottom-right', width:480, height:360}) window_dock({action:'pin', title:'Settings', duration_ms:5000}) window_dock({action:'unpin', title:'Settings'}) |
| workspace_launchA | Purpose: Launch an application and wait for its new window to appear, returning title, HWND, and PID. Details: Runs the command via ShellExecute, snapshots the window list before launch, then polls until a new HWND appears (compared by HWND, not title). Returns {windowTitle, hwnd, pid, elapsedMs}. Works for localized window titles (e.g. '電卓' for calc.exe) because detection is HWND-based, not title-based. timeoutMs default 10000. detach=true fires without waiting and returns no window info. Prefer: Use instead of run_macro({exec, sleep, desktop_discover}) combos. Follow with focus_window(windowTitle) to interact with the launched app. Caveats: Single-instance apps that reuse an existing window will not register as a new HWND — call desktop_discover first to check if the window is already open. detach=true returns immediately with no window title or hwnd. Examples: workspace_launch({command:'notepad.exe'}) → {windowTitle:'', hwnd:'...', pid:...} workspace_launch({command:'calc.exe', timeoutMs:15000}) |
| workspace_snapshotA | Purpose: Orient fully in one call — returns display layouts, all window thumbnails (WebP), and per-window actionable element lists with clickAt coords. Details: uiSummary.actionable[] per window includes: action ('click'|'type'|'expand'|'select'), clickAt {x,y} (pass directly to mouse_click), value (current text for editable fields). Runs parallel internally; latency ≈ max(single screenshot), not N×screenshots. Also resets the diffMode buffer so subsequent screenshot(diffMode=true) returns only changes (P-frame). Prefer: Use at session start or after major workspace changes. Use screenshot(detail='meta') for cheap re-orientation within a session. Use screenshot(detail='text', windowTitle=X) for a single-window update. Caveats: Thumbnails are scaled, not 1:1 — use screenshot(dotByDot=true, windowTitle=X) for pixel-accurate coords on a specific window after snapshot. |
Prompts
Interactive templates invoked by user choice
| Name | Description |
|---|---|
No prompts | |
Resources
Contextual data attached and managed by the client
| Name | Description |
|---|---|
No resources | |
Latest Blog Posts
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/Harusame64/desktop-touch-mcp'
If you have feedback or need assistance with the MCP directory API, please join our Discord server