Skip to main content
Glama

ocr_screenshot

Capture screenshots and extract text with tap coordinates using OCR to locate and interact with UI elements across iOS and Android platforms.

Instructions

RECOMMENDED: Use this tool FIRST when you need to find and tap UI elements. Takes a screenshot and extracts all visible text with tap-ready coordinates using OCR. ADVANTAGES over accessibility trees: (1) Works on ANY visible text regardless of accessibility labels, (2) Returns ready-to-use tapX/tapY coordinates - no conversion needed, (3) Faster than parsing accessibility hierarchies, (4) Works consistently across iOS and Android. USE THIS FOR: Finding buttons, labels, menu items, tab bars, or any text you need to tap. Simply find the text in the results and use its tapX/tapY with the tap command.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
platformYesPlatform to capture screenshot from
deviceIdNoOptional device ID (Android) or UDID (iOS). Uses first available device if not specified.

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/igorzheludkov/metro-logs-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server