Skip to main content
Glama
alderban107

hyprland-mcp

by alderban107

screenshot_with_ocr

Capture screenshots and extract text with OCR in one step, returning both the image and text with word coordinates for Hyprland desktop automation.

Instructions

Take a screenshot AND run OCR, returning both the image and extracted text.

More efficient than calling screenshot + find_text_on_screen separately. The text includes screen coordinates for every detected word.

Args: monitor: Capture a specific monitor window: Capture a specific window (e.g. "class:discord") region: Capture a region as "X,Y WxH" max_width: Maximum output width for the image (default 1024) quality: JPEG quality for the image (default 60) scope: "auto" (default) captures the active window. "full" captures entire desktop.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
monitorNo
windowNo
regionNo
max_widthNo
qualityNo
scopeNoauto

Implementation Reference

  • The 'screenshot_with_ocr' tool handler function in 'hyprland_mcp/server.py'. It captures the screen (via 'screenshot.py') and performs OCR (via 'ocr.py').
    async def screenshot_with_ocr(
        monitor: str | None = None,
        window: str | None = None,
        region: str | None = None,
        max_width: int = 1024,
        quality: int = 60,
        scope: str = "auto",
    ) -> list:
        """Take a screenshot AND run OCR, returning both the image and extracted text.
    
        More efficient than calling screenshot + find_text_on_screen separately.
        The text includes screen coordinates for every detected word.
    
        Args:
            monitor: Capture a specific monitor
            window: Capture a specific window (e.g. "class:discord")
            region: Capture a region as "X,Y WxH"
            max_width: Maximum output width for the image (default 1024)
            quality: JPEG quality for the image (default 60)
            scope: "auto" (default) captures the active window. "full" captures entire desktop.
        """
        from . import screenshot as ss, ocr
    
        if scope == "full" and not (monitor or window or region):
            png_bytes, origin_x, origin_y = await ss.capture_raw()
        else:
            png_bytes, origin_x, origin_y = await _auto_scope_capture(
                monitor, window, region,
            )
    
        text = ocr.extract_text(png_bytes)
        image, _ = ss.resize_and_compress(png_bytes, max_width=max_width, quality=quality)
    
        return [image, f"OCR text:\n{text}"]
  • The tool registration using '@mcp.tool()' immediately preceding the 'screenshot_with_ocr' function definition.
    @mcp.tool()

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/alderban107/hyprland-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server