screenshot_with_ocr
Capture screenshots and extract text with OCR in one step, returning both the image and text with word coordinates for Hyprland desktop automation.
Instructions
Take a screenshot AND run OCR, returning both the image and extracted text.
More efficient than calling screenshot + find_text_on_screen separately. The text includes screen coordinates for every detected word.
Args: monitor: Capture a specific monitor window: Capture a specific window (e.g. "class:discord") region: Capture a region as "X,Y WxH" max_width: Maximum output width for the image (default 1024) quality: JPEG quality for the image (default 60) scope: "auto" (default) captures the active window. "full" captures entire desktop.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| monitor | No | ||
| window | No | ||
| region | No | ||
| max_width | No | ||
| quality | No | ||
| scope | No | auto |
Implementation Reference
- hyprland_mcp/server.py:642-675 (handler)The 'screenshot_with_ocr' tool handler function in 'hyprland_mcp/server.py'. It captures the screen (via 'screenshot.py') and performs OCR (via 'ocr.py').
async def screenshot_with_ocr( monitor: str | None = None, window: str | None = None, region: str | None = None, max_width: int = 1024, quality: int = 60, scope: str = "auto", ) -> list: """Take a screenshot AND run OCR, returning both the image and extracted text. More efficient than calling screenshot + find_text_on_screen separately. The text includes screen coordinates for every detected word. Args: monitor: Capture a specific monitor window: Capture a specific window (e.g. "class:discord") region: Capture a region as "X,Y WxH" max_width: Maximum output width for the image (default 1024) quality: JPEG quality for the image (default 60) scope: "auto" (default) captures the active window. "full" captures entire desktop. """ from . import screenshot as ss, ocr if scope == "full" and not (monitor or window or region): png_bytes, origin_x, origin_y = await ss.capture_raw() else: png_bytes, origin_x, origin_y = await _auto_scope_capture( monitor, window, region, ) text = ocr.extract_text(png_bytes) image, _ = ss.resize_and_compress(png_bytes, max_width=max_width, quality=quality) return [image, f"OCR text:\n{text}"] - hyprland_mcp/server.py:641-641 (registration)The tool registration using '@mcp.tool()' immediately preceding the 'screenshot_with_ocr' function definition.
@mcp.tool()