interact_with_ui
Perform UI interactions on Android and iOS apps by tapping, swiping, or inputting text using element targeting or coordinates.
Instructions
Perform UI interactions like tap, swipe, or text input. Can target elements by ID/text or by coordinates.
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| platform | Yes | Target platform | |
| action | Yes | Type of interaction to perform | |
| element | No | Element ID, resource ID, or text to interact with | |
| x | No | X coordinate for coordinate-based interaction | |
| y | No | Y coordinate for coordinate-based interaction | |
| text | No | Text to input (for input_text action) | |
| direction | No | Swipe direction (for swipe action) | |
| durationMs | No | Duration in milliseconds (for long_press and swipe, default: 300) | |
| deviceId | No | Device ID or name (optional) |
Implementation Reference
- src/tools/ui/interact-with-ui.ts:42-131 (handler)Main handler function for the 'interact_with_ui' tool. Validates inputs, locates elements or uses coordinates, performs platform-specific UI interactions (tap, swipe, input, etc.) for Android and iOS, and returns interaction results.export async function interactWithUI(args: InteractWithUIArgs): Promise<InteractionResult> { const { platform, action, element, x, y, text, direction, durationMs = 300, deviceId, } = args; // Validate platform if (!isPlatform(platform)) { throw Errors.invalidArguments(`Invalid platform: ${platform}. Must be 'android' or 'ios'`); } // Validate action if (!INTERACTION_TYPES.includes(action as InteractionType)) { throw Errors.invalidArguments( `Invalid action: ${action}. Must be one of: ${INTERACTION_TYPES.join(', ')}` ); } const interactionType = action as InteractionType; const startTime = Date.now(); // Determine target coordinates let targetCoords: Point; let targetElement: UIElement | undefined; if (element) { // Find element by ID or text const foundElement = await findTargetElement(platform, element, deviceId); if (!foundElement) { throw Errors.elementNotFound(element); } targetElement = foundElement; targetCoords = foundElement.center; } else if (x !== undefined && y !== undefined) { targetCoords = { x, y }; } else if (interactionType !== 'input_text' && interactionType !== 'clear') { throw Errors.invalidArguments('Either element or coordinates (x, y) must be provided'); } else { // For input_text and clear, we operate on the focused element targetCoords = { x: 0, y: 0 }; } // Perform interaction try { if (platform === 'android') { await performAndroidInteraction(interactionType, targetCoords, { text, direction: direction as SwipeDirection, durationMs, deviceId, }); } else { await performIOSInteraction(interactionType, targetCoords, { text, direction: direction as SwipeDirection, durationMs, deviceId, }); } return { success: true, interactionType: action, targetElement: targetElement ? { id: targetElement.id, type: targetElement.type, bounds: targetElement.bounds, } : undefined, coordinates: targetCoords, durationMs: Date.now() - startTime, }; } catch (error) { return { success: false, interactionType: action, coordinates: targetCoords, durationMs: Date.now() - startTime, error: error instanceof Error ? error.message : String(error), }; } }
- TypeScript interface defining the input schema/arguments for the interact_with_ui tool, including platform, action type, element/coordinates, text, direction, etc.export interface InteractWithUIArgs { /** Target platform */ platform: string; /** Interaction type */ action: string; /** Target element ID or text (for element-based interactions) */ element?: string; /** X coordinate (for coordinate-based interactions) */ x?: number; /** Y coordinate (for coordinate-based interactions) */ y?: number; /** Text to input (for input_text action) */ text?: string; /** Swipe direction (for swipe action) */ direction?: string; /** Duration in ms (for long_press and swipe) */ durationMs?: number; /** Target device ID or name */ deviceId?: string; }
- src/tools/ui/interact-with-ui.ts:341-394 (registration)Registration function for the interact_with_ui tool. Registers the tool name, description, JSON input schema, and binds the interactWithUI handler to the global tool registry.export function registerInteractWithUITool(): void { getToolRegistry().register( 'interact_with_ui', { description: 'Perform UI interactions like tap, swipe, or text input. Can target elements by ID/text or by coordinates.', inputSchema: createInputSchema( { platform: { type: 'string', enum: ['android', 'ios'], description: 'Target platform', }, action: { type: 'string', enum: ['tap', 'long_press', 'swipe', 'input_text', 'clear'], description: 'Type of interaction to perform', }, element: { type: 'string', description: 'Element ID, resource ID, or text to interact with', }, x: { type: 'number', description: 'X coordinate for coordinate-based interaction', }, y: { type: 'number', description: 'Y coordinate for coordinate-based interaction', }, text: { type: 'string', description: 'Text to input (for input_text action)', }, direction: { type: 'string', enum: ['up', 'down', 'left', 'right'], description: 'Swipe direction (for swipe action)', }, durationMs: { type: 'number', description: 'Duration in milliseconds (for long_press and swipe, default: 300)', }, deviceId: { type: 'string', description: 'Device ID or name (optional)', }, }, ['platform', 'action'] ), }, (args) => interactWithUI(args as unknown as InteractWithUIArgs) ); }
- src/tools/register.ts:132-135 (registration)Central registration point where registerInteractWithUITool() is imported and invoked as part of registerAllTools(), ensuring the tool is registered during server startup.const { registerInteractWithUITool } = await import('./ui/interact-with-ui.js'); registerGetUIContextTool(); registerInteractWithUITool();
- Helper function implementing Android-specific UI interactions using ADB commands for tap, long_press, swipe, input_text, and clear actions.async function performAndroidInteraction( action: InteractionType, coords: Point, options: { text?: string; direction?: SwipeDirection; durationMs: number; deviceId?: string; } ): Promise<void> { const { text, direction, durationMs, deviceId } = options; switch (action) { case 'tap': await tap(coords.x, coords.y, deviceId); break; case 'long_press': // Implement as swipe with same start and end await swipe(coords.x, coords.y, coords.x, coords.y, durationMs, deviceId); break; case 'swipe': if (!direction) { throw Errors.invalidArguments('direction is required for swipe action'); } const swipeCoords = calculateSwipeCoordinates(coords, direction, 500); await swipe( swipeCoords.startX, swipeCoords.startY, swipeCoords.endX, swipeCoords.endY, durationMs, deviceId ); break; case 'input_text': if (!text) { throw Errors.invalidArguments('text is required for input_text action'); } await inputText(text, deviceId); break; case 'clear': // Select all and delete await executeShell('adb', [ ...(deviceId ? ['-s', deviceId] : []), 'shell', 'input', 'keyevent', 'KEYCODE_CTRL_A', ]); await executeShell('adb', [ ...(deviceId ? ['-s', deviceId] : []), 'shell', 'input', 'keyevent', 'KEYCODE_DEL', ]); break; } }