webview_interact
Perform click, double-click, scroll, swipe, and focus actions on Tauri app webviews using CSS selectors, XPath, or text matching.
Instructions
[Tauri Apps Only] Click, scroll, swipe, focus, or perform gestures in a Tauri app webview. Supported actions: click, double-click, long-press, scroll, swipe, focus. Supports CSS selectors (default), XPath, and text content matching via the strategy parameter. Requires active driver_session. For browser interaction, use Chrome DevTools MCP instead.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| windowId | No | Window label to target (defaults to "main") | |
| appIdentifier | No | App port or bundle ID to target. Defaults to the only connected app or the default app if multiple are connected. | |
| action | Yes | Type of interaction to perform | |
| selector | No | Element selector: CSS selector (default), XPath expression, text content, or ref ID (e.g., "ref=e3") | |
| strategy | No | Selector strategy: "css" (default) for CSS selectors, "xpath" for XPath expressions, "text" to find elements by text content, with fallback to placeholder, aria-label, and title attributes. Ref IDs (e.g., "ref=e3") work with any strategy. | css |
| x | No | X coordinate for direct coordinate interaction | |
| y | No | Y coordinate for direct coordinate interaction | |
| duration | No | Duration in ms for long-press or swipe (default: 500ms for long-press, 300ms for swipe) | |
| scrollX | No | Horizontal scroll amount in pixels (positive = right) | |
| scrollY | No | Vertical scroll amount in pixels (positive = down) | |
| fromX | No | Starting X coordinate for swipe | |
| fromY | No | Starting Y coordinate for swipe | |
| toX | No | Ending X coordinate for swipe | |
| toY | No | Ending Y coordinate for swipe |
Implementation Reference
- packages/mcp-server/src/tools-registry.ts:231-253 (registration)Registration of the 'webview_interact' tool with its name, description, category, schema (InteractSchema), and handler that delegates to the interact() function.
// WebView Interaction Tools { name: 'webview_interact', description: '[Tauri Apps Only] Click, scroll, swipe, focus, or perform gestures in a Tauri app webview. ' + 'Supported actions: click, double-click, long-press, scroll, swipe, focus. ' + 'Supports CSS selectors (default), XPath, and text content matching via the strategy parameter. ' + 'Requires active driver_session. ' + 'For browser interaction, use Chrome DevTools MCP instead.', category: TOOL_CATEGORIES.UI_AUTOMATION, schema: InteractSchema, annotations: { title: 'Interact with Tauri Webview', readOnlyHint: false, destructiveHint: false, openWorldHint: false, }, handler: async (args) => { const parsed = InteractSchema.parse(args); return await interact(parsed); }, }, - InteractSchema: Zod schema for the webview_interact tool, defining input validation for action (click/double-click/long-press/scroll/swipe/focus), selector, strategy, coordinates (x, y), duration, scroll amounts, and swipe coordinates.
export const InteractSchema = WindowTargetSchema.extend({ action: z.enum([ 'click', 'double-click', 'long-press', 'scroll', 'swipe', 'focus' ]) .describe('Type of interaction to perform'), selector: z.string().optional().describe( 'Element selector: CSS selector (default), XPath expression, text content, or ref ID (e.g., "ref=e3")' ), strategy: selectorStrategyField, x: z.number().optional().describe('X coordinate for direct coordinate interaction'), y: z.number().optional().describe('Y coordinate for direct coordinate interaction'), duration: z.number().optional() .describe('Duration in ms for long-press or swipe (default: 500ms for long-press, 300ms for swipe)'), scrollX: z.number().optional().describe('Horizontal scroll amount in pixels (positive = right)'), scrollY: z.number().optional().describe('Vertical scroll amount in pixels (positive = down)'), fromX: z.number().optional().describe('Starting X coordinate for swipe'), fromY: z.number().optional().describe('Starting Y coordinate for swipe'), toX: z.number().optional().describe('Ending X coordinate for swipe'), toY: z.number().optional().describe('Ending Y coordinate for swipe'), }); - The interact() handler function that executes the webview_interact logic. It dispatches swipe to performSwipe(), focus to focusElement(), and all other actions (click, double-click, long-press, scroll) by building the interact script and executing it in the webview via executeInWebview().
export async function interact(options: { action: string; selector?: string; strategy?: string; x?: number; y?: number; duration?: number; scrollX?: number; scrollY?: number; fromX?: number; fromY?: number; toX?: number; toY?: number; windowId?: string; appIdentifier?: string | number; }): Promise<string> { const { action, selector, strategy, x, y, duration, scrollX, scrollY, fromX, fromY, toX, toY, windowId, appIdentifier } = options; // Handle swipe action separately since it has different logic if (action === 'swipe') { return performSwipe({ fromX, fromY, toX, toY, duration, windowId, appIdentifier }); } // Handle focus action if (action === 'focus') { if (!selector) { throw new Error('Focus action requires a selector'); } return focusElement({ selector, strategy, windowId, appIdentifier }); } const script = buildScript(SCRIPTS.interact, { action, selector: selector ?? null, strategy: strategy ?? 'css', x: x ?? null, y: y ?? null, duration: duration ?? 500, scrollX: scrollX ?? 0, scrollY: scrollY ?? 0, }); try { return await executeInWebview(script, windowId, appIdentifier); } catch(error: unknown) { const message = error instanceof Error ? error.message : String(error); throw new Error(`Interaction failed: ${message}`); } } - performSwipe(): Helper for swipe gesture actions, building the swipe script and executing it in the webview.
async function performSwipe(options: SwipeOptions): Promise<string> { const { fromX, fromY, toX, toY, duration = 300, windowId, appIdentifier } = options; if (fromX === undefined || fromY === undefined || toX === undefined || toY === undefined) { throw new Error('Swipe action requires fromX, fromY, toX, and toY coordinates'); } const script = buildScript(SCRIPTS.swipe, { fromX, fromY, toX, toY, duration }); try { return await executeInWebview(script, windowId, appIdentifier); } catch(error: unknown) { const message = error instanceof Error ? error.message : String(error); throw new Error(`Swipe failed: ${message}`); } } - The injected JavaScript IIFE executed inside the webview to perform click, double-click, long-press, scroll interactions. Uses MouseEvent dispatch and elementFromPoint for coordinate-based targeting.
/** * Webview interaction script - handles click, double-click, long-press, and scroll actions * This script is injected into the webview and executed with parameters. * * @param {Object} params * @param {string} params.action - The action to perform * @param {string|null} params.selector - CSS selector, XPath, text, or ref ID (e.g., "ref=e3") for the element * @param {string} params.strategy - Selector strategy: 'css', 'xpath', or 'text' * @param {number|null} params.x - X coordinate * @param {number|null} params.y - Y coordinate * @param {number} params.duration - Duration for long-press * @param {number} params.scrollX - Horizontal scroll amount * @param {number} params.scrollY - Vertical scroll amount */ (function(params) { const { action, selector, strategy, x, y, duration, scrollX, scrollY } = params; function resolveElement(selectorOrRef) { if (!selectorOrRef) return null; var el = window.__MCP__.resolveRef(selectorOrRef, strategy); if (!el) throw new Error('Element not found: ' + selectorOrRef); return el; } function matchHint() { if (!selector) return ''; var count = window.__MCP__.countAll(selector, strategy); if (count > 1) return ' (+' + (count - 1) + ' more match' + (count - 1 === 1 ? '' : 'es') + ')'; return ''; } let element = null; let targetX, targetY; // For scroll action, we don't necessarily need a selector or coordinates if (action === 'scroll') { if (selector) { element = resolveElement(selector); } } else { // For other actions, we need either selector or coordinates if (selector) { element = resolveElement(selector); const rect = element.getBoundingClientRect(); targetX = rect.left + rect.width / 2; targetY = rect.top + rect.height / 2; } else if (x !== null && y !== null) { targetX = x; targetY = y; element = document.elementFromPoint(x, y); } else { throw new Error('Either selector or coordinates (x, y) must be provided'); } } // Perform the interaction const eventOptions = { bubbles: true, cancelable: true, view: window, clientX: targetX, clientY: targetY, }; if (action === 'click') { if (element) { element.dispatchEvent(new MouseEvent('mousedown', eventOptions)); element.dispatchEvent(new MouseEvent('mouseup', eventOptions)); element.dispatchEvent(new MouseEvent('click', eventOptions)); } return `Clicked at (${targetX}, ${targetY})` + matchHint(); } if (action === 'double-click') { if (element) { element.dispatchEvent(new MouseEvent('mousedown', eventOptions)); element.dispatchEvent(new MouseEvent('mouseup', eventOptions)); element.dispatchEvent(new MouseEvent('click', eventOptions)); element.dispatchEvent(new MouseEvent('mousedown', eventOptions)); element.dispatchEvent(new MouseEvent('mouseup', eventOptions)); element.dispatchEvent(new MouseEvent('click', eventOptions)); element.dispatchEvent(new MouseEvent('dblclick', eventOptions)); } return `Double-clicked at (${targetX}, ${targetY})` + matchHint(); } if (action === 'long-press') { if (element) { element.dispatchEvent(new MouseEvent('mousedown', eventOptions)); setTimeout(() => { element.dispatchEvent(new MouseEvent('mouseup', eventOptions)); }, duration); } return `Long-pressed at (${targetX}, ${targetY}) for ${duration}ms` + matchHint(); } if (action === 'scroll') { const scrollTarget = element || window; if (scrollX !== 0 || scrollY !== 0) { if (scrollTarget === window) { window.scrollBy(scrollX, scrollY); } else { scrollTarget.scrollLeft += scrollX; scrollTarget.scrollTop += scrollY; } return `Scrolled by (${scrollX}, ${scrollY}) pixels` + matchHint(); } return 'No scroll performed (scrollX and scrollY are both 0)'; } throw new Error(`Unknown action: ${action}`); })