ocr
Extract text from mobile device screens via OCR. Specify region, confidence, and regex pattern to get text, position, and accuracy.
Instructions
在设备屏幕上执行 OCR 文字识别。返回识别到的文字、位置坐标和置信度。
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| mode | No | OCR 引擎:mlkit(默认,快)、paddle_v2、paddle_v3(最新)、tess | mlkit |
| rect | No | 识别区域 [left, top, right, bottom],不传则全屏 | |
| pattern | No | 正则表达式过滤结果 | |
| confidence | No | 置信度阈值 0.0-1.0,默认 0.1 |
Implementation Reference
- src/ascript_mcp/local.py:666-697 (registration)Tool registration for 'ocr': defines name, description (OCR on device screen), and input schema with mode (mlkit/paddle_v2/paddle_v3/tess), rect (region), pattern (regex filter), and confidence (threshold).
Tool( name="ocr", description=( "在设备屏幕上执行 OCR 文字识别。" "返回识别到的文字、位置坐标和置信度。" ), inputSchema={ "type": "object", "properties": { "mode": { "type": "string", "description": "OCR 引擎:mlkit(默认,快)、paddle_v2、paddle_v3(最新)、tess", "enum": ["mlkit", "paddle_v2", "paddle_v3", "tess"], "default": "mlkit", }, "rect": { "type": "array", "items": {"type": "integer"}, "description": "识别区域 [left, top, right, bottom],不传则全屏", }, "pattern": { "type": "string", "description": "正则表达式过滤结果", }, "confidence": { "type": "number", "description": "置信度阈值 0.0-1.0,默认 0.1", "default": 0.1, }, }, }, ), - src/ascript_mcp/local.py:1134-1140 (handler)Dispatch handler for 'ocr' tool: calls dev.ocr() with mode, rect, pattern, and confidence arguments from the caller.
if name == "ocr": return dev.ocr( mode=args.get("mode", "mlkit"), rect=args.get("rect"), pattern=args.get("pattern"), confidence=args.get("confidence", 0.1), ) - src/ascript_mcp/device.py:754-789 (handler)Actual OCR implementation: executes OCR on the connected device. Maps mode string to int, builds GP strack parameters, and calls _run_gp() with the appropriate screen.Ocr class (android or ios).
_OCR_MODES = { "mlkit": 1, "paddle_v2": 2, "paddle_v3": 3, "tess": 4, } def ocr( mode: str = "mlkit", rect: Optional[list[int]] = None, pattern: Optional[str] = None, confidence: float = 0.1, ) -> dict[str, Any]: """ 在设备屏幕上执行 OCR 文字识别。 mode: mlkit / paddle_v2 / paddle_v3 / tess rect: [left, top, right, bottom] 识别区域,不传则全屏 pattern: 正则过滤 confidence: 置信度阈值 0.0-1.0 """ d = require_device() mode_int = _OCR_MODES.get(mode, 1) # 构建 GP strack 参数 params_parts = [f"mode={mode_int}"] if rect: params_parts.append(f"rect={rect}") if pattern: params_parts.append(f"pattern='{pattern}'") params_parts.append(f"confidence={confidence}") params_str = ", ".join(params_parts) return _run_gp(d, "ascript.android.screen.Ocr" if d.platform == "android" else "ascript.ios.screen.Ocr", params_str) - src/ascript_mcp/device.py:867-901 (helper)GP engine helper: _run_gp() handles the device-side screenshot, builds the strack payload, and sends it to the device's GP API endpoint for execution.
def _run_gp(device: Device, class_id: str, params_str: str) -> dict[str, Any]: """ 调用设备 GP 引擎执行图色工具。 流程:1) 先在设备端截图保存 2) 用截图路径调 GP strack 引擎 通过 /api/gp/strack (Android) 或 /api/screen/gp (iOS) 发送请求。 """ # 先截图保存到设备 image_path = _ensure_screenshot(device) strack = [ { "id": class_id, "type": "图色工具", "data": {"params": params_str}, } ] if device.platform == "android": url = f"{device.base_url}/api/gp/strack" else: url = f"{device.base_url}/api/screen/gp" payload = { "strack": json.dumps(strack), "image": image_path, "gp": "as_gp_test_screen_temp", } body = urllib.parse.urlencode(payload).encode("utf-8") headers = dict(device.headers) headers["Content-Type"] = "application/x-www-form-urlencoded" req = urllib.request.Request(url, data=body, headers=headers, method="POST") with urllib.request.urlopen(req, timeout=30) as resp: return json.loads(resp.read().decode("utf-8")) - src/ascript_mcp/device.py:846-864 (helper)Helper function _ensure_screenshot() captures a screenshot on the device and returns the image path, which is required by the GP engine before running OCR.
def _ensure_screenshot(device: Device) -> str: """ 在设备端截图并保存,返回设备上的截图文件路径。 GP 引擎需要一个设备端的图片路径才能工作。 """ if device.platform == "android": # 用 path 参数指定保存路径,GP 引擎从该路径读取 save_path = "/sdcard/airscript/screen/mcp_temp.png" url = f"{device.base_url}/api/tool/screen/capture?path={urllib.parse.quote(save_path)}" _fetch_bytes(url, headers=device.headers, timeout=10) return save_path else: # iOS: /api/screen/capture/list?capture=true 截图并保存到设备,返回路径 url = f"{device.base_url}/api/screen/capture/list?capture=true" result = _fetch_json(url, method="GET", headers=device.headers, timeout=10) items = result.get("data", []) if items: return items[-1].get("path", "") return ""