click_text
Locate and click text on screen using OCR, screenshot, and mouse automation in Hyprland desktop environments.
Instructions
Find text on screen and click it — screenshot + OCR + click in one call.
By default, searches only the active window for better accuracy and speed.
Args: target: Text to find and click (case-insensitive) button: Mouse button ("left", "right", "middle") double: Whether to double-click monitor: Limit search to a specific monitor window: Limit search to a specific window (e.g. "class:discord") region: Limit search to a region "X,Y WxH" occurrence: Which match to click if multiple found (1 = first/best, 2 = second, etc.) scope: "auto" (default) searches the active window. "full" searches entire desktop.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| target | Yes | ||
| button | No | left | |
| double | No | ||
| monitor | No | ||
| window | No | ||
| region | No | ||
| occurrence | No | ||
| scope | No | auto |
Implementation Reference
- hyprland_mcp/server.py:494-553 (handler)The 'click_text' tool is defined here as an MCP tool. It handles the logic of capturing the screen, performing OCR, finding the text coordinates, and executing the click input.
@mcp.tool() async def click_text( target: str, button: str = "left", double: bool = False, monitor: str | None = None, window: str | None = None, region: str | None = None, occurrence: int = 1, scope: str = "auto", ) -> str: """Find text on screen and click it — screenshot + OCR + click in one call. By default, searches only the active window for better accuracy and speed. Args: target: Text to find and click (case-insensitive) button: Mouse button ("left", "right", "middle") double: Whether to double-click monitor: Limit search to a specific monitor window: Limit search to a specific window (e.g. "class:discord") region: Limit search to a region "X,Y WxH" occurrence: Which match to click if multiple found (1 = first/best, 2 = second, etc.) scope: "auto" (default) searches the active window. "full" searches entire desktop. """ from . import screenshot as ss, ocr, input as inp if scope == "full" and not (monitor or window or region): png_bytes, origin_x, origin_y = await ss.capture_raw() else: png_bytes, origin_x, origin_y = await _auto_scope_capture( monitor, window, region, ) boxes = ocr.extract_boxes(png_bytes) matches = ocr.find_text(boxes, target) if not matches: all_text = ocr.extract_text(png_bytes) preview = all_text[:500] + "..." if len(all_text) > 500 else all_text return f"Could not find '{target}' on screen.\n\nOCR detected text:\n{preview}" if occurrence > len(matches): return ( f"Only found {len(matches)} match(es) for '{target}', " f"but occurrence={occurrence} requested." ) match = matches[occurrence - 1] screen_x = match["x"] + origin_x + match["w"] // 2 screen_y = match["y"] + origin_y + match["h"] // 2 await inp.move_cursor(screen_x, screen_y) await inp.click(button, double=double) kind = "Double-clicked" if double else "Clicked" return ( f"{kind} '{match['text']}' at ({screen_x}, {screen_y}) " f"[conf: {match['conf']}%]" )