find_text_on_screen
Locate text on your screen using OCR to find matching coordinates for automation tasks. Specify target text and optional search areas like monitors, windows, or regions to get precise screen positions.
Instructions
Find text on screen using OCR. Returns matching locations in screen coordinates.
Take a screenshot, run OCR, and find all occurrences of the target text. Coordinates are in absolute screen space — ready to pass to mouse_click.
Args: target: Text to find (case-insensitive, supports multi-word) monitor: Limit search to a specific monitor window: Limit search to a specific window (e.g. "class:discord") region: Limit search to a region "X,Y WxH" scope: "auto" (default) captures just the active window for better accuracy. "full" captures the entire desktop.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| target | Yes | ||
| monitor | No | ||
| window | No | ||
| region | No | ||
| scope | No | auto |
Implementation Reference
- hyprland_mcp/server.py:445-506 (handler)The implementation of the find_text_on_screen tool, which captures a screenshot and performs OCR to locate text.
async def find_text_on_screen( target: str, monitor: str | None = None, window: str | None = None, region: str | None = None, scope: str = "auto", ) -> str: """Find text on screen using OCR. Returns matching locations in screen coordinates. Take a screenshot, run OCR, and find all occurrences of the target text. Coordinates are in absolute screen space — ready to pass to mouse_click. Args: target: Text to find (case-insensitive, supports multi-word) monitor: Limit search to a specific monitor window: Limit search to a specific window (e.g. "class:discord") region: Limit search to a region "X,Y WxH" scope: "auto" (default) captures just the active window for better accuracy. "full" captures the entire desktop. """ from . import screenshot as ss, ocr if scope == "full" and not (monitor or window or region): png_bytes, origin_x, origin_y = await ss.capture_raw() else: png_bytes, origin_x, origin_y = await _auto_scope_capture( monitor, window, region, ) boxes = ocr.extract_boxes(png_bytes) matches = ocr.find_text(boxes, target) if not matches: all_text = ocr.extract_text(png_bytes) preview = all_text[:500] + "..." if len(all_text) > 500 else all_text return f"Text '{target}' not found.\n\nOCR detected text:\n{preview}" lines = [f"Found {len(matches)} match(es) for '{target}':"] for m in matches: screen_x = m["x"] + origin_x + m["w"] // 2 screen_y = m["y"] + origin_y + m["h"] // 2 lines.append( f"- \"{m['text']}\" at screen ({screen_x}, {screen_y}) " f"[box: {m['x']+origin_x},{m['y']+origin_y} {m['w']}x{m['h']}, " f"conf: {m['conf']}%]" ) return "\n".join(lines) @mcp.tool() async def click_text( target: str, button: str = "left", double: bool = False, monitor: str | None = None, window: str | None = None, region: str | None = None, occurrence: int = 1, scope: str = "auto", ) -> str: """Find text on screen and click it — screenshot + OCR + click in one call. - hyprland_mcp/server.py:444-506 (registration)Registration of the find_text_on_screen tool using the @mcp.tool() decorator.
@mcp.tool() async def find_text_on_screen( target: str, monitor: str | None = None, window: str | None = None, region: str | None = None, scope: str = "auto", ) -> str: """Find text on screen using OCR. Returns matching locations in screen coordinates. Take a screenshot, run OCR, and find all occurrences of the target text. Coordinates are in absolute screen space — ready to pass to mouse_click. Args: target: Text to find (case-insensitive, supports multi-word) monitor: Limit search to a specific monitor window: Limit search to a specific window (e.g. "class:discord") region: Limit search to a region "X,Y WxH" scope: "auto" (default) captures just the active window for better accuracy. "full" captures the entire desktop. """ from . import screenshot as ss, ocr if scope == "full" and not (monitor or window or region): png_bytes, origin_x, origin_y = await ss.capture_raw() else: png_bytes, origin_x, origin_y = await _auto_scope_capture( monitor, window, region, ) boxes = ocr.extract_boxes(png_bytes) matches = ocr.find_text(boxes, target) if not matches: all_text = ocr.extract_text(png_bytes) preview = all_text[:500] + "..." if len(all_text) > 500 else all_text return f"Text '{target}' not found.\n\nOCR detected text:\n{preview}" lines = [f"Found {len(matches)} match(es) for '{target}':"] for m in matches: screen_x = m["x"] + origin_x + m["w"] // 2 screen_y = m["y"] + origin_y + m["h"] // 2 lines.append( f"- \"{m['text']}\" at screen ({screen_x}, {screen_y}) " f"[box: {m['x']+origin_x},{m['y']+origin_y} {m['w']}x{m['h']}, " f"conf: {m['conf']}%]" ) return "\n".join(lines) @mcp.tool() async def click_text( target: str, button: str = "left", double: bool = False, monitor: str | None = None, window: str | None = None, region: str | None = None, occurrence: int = 1, scope: str = "auto", ) -> str: """Find text on screen and click it — screenshot + OCR + click in one call.