workflow-tap-element
Find and tap UI elements by label, input text, and verify with screenshot in a single call, reducing token usage by internalizing intermediate accessibility checks and element search.
Instructions
workflow-tap-element
High-level semantic UI interaction - find and tap elements by name without coordinate hunting.
Overview
Orchestrates accessibility-first UI automation in a single call:
Check Accessibility - Assess UI richness for automation approach
Find Element - Semantic search by label/identifier
Tap Element - Execute tap at discovered coordinates
Input Text (optional) - Type into tapped field
Verify Result (optional) - Screenshot for confirmation
This workflow keeps intermediate results internal, reducing agent context usage by ~80% compared to calling each tool manually.
Parameters
Required
elementQuery (string): Search term for element (e.g., "Login", "Submit", "Email")
Case-insensitive partial matching ("log" matches "Login")
Optional
inputText (string): Text to type after tapping (for text fields)
verifyResult (boolean): Take screenshot after action (default: false)
udid (string): Target device - auto-detected if omitted
screenContext (string): Screen name for tracking (e.g., "LoginScreen")
Returns
Consolidated result with:
success: Overall workflow success
tappedElement: Found element details (type, label, coordinates)
inputText: Text entry status (if requested)
verified: Screenshot status (if requested)
accessibilityQuality: UI richness assessment
totalDuration: Total workflow time
guidance: Next steps
Examples
Tap Login Button
{"elementQuery": "Login"}Finds and taps the Login button.
Tap Email Field and Enter Text
{
"elementQuery": "Email",
"inputText": "user@example.com",
"screenContext": "LoginScreen"
}Finds email field, taps it, enters text.
Full Verification Workflow
{
"elementQuery": "Submit",
"verifyResult": true,
"screenContext": "SignupForm"
}Taps Submit button and captures verification screenshot.
Why Use This Workflow?
Token Efficiency
Manual approach: 4-5 tool calls × ~50 tokens each = ~200+ tokens in responses
Workflow approach: 1 call with consolidated response = ~80 tokens
Reduced Context Pollution
Intermediate accessibility data not exposed
Element search results summarized
Only actionable outcome returned
Error Handling
Graceful degradation on partial failures
Helpful guidance when element not found
Clear troubleshooting steps
Related Tools
idb-ui-find-element: Direct element search (used internally)
idb-ui-tap: Direct tap (used internally)
accessibility-quality-check: Direct quality check (used internally)
workflow-fresh-install: Clean app installation workflow
Notes
Falls back gracefully if accessibility is minimal
Non-fatal errors (input, screenshot) don't fail the workflow
Element matching uses partial, case-insensitive search
Small delay between tap and input for keyboard appearance
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| elementQuery | Yes | Search term for element (e.g., "Login", "Submit") | |
| inputText | No | Text to type after tapping | |
| verifyResult | No | Take screenshot after action | |
| udid | No | Target device | |
| screenContext | No | Screen name for tracking |