Skip to main content
Glama
alderban107

hyprland-mcp

by alderban107

screenshot_with_ocr

Capture screenshots and extract text with OCR in one step, returning both the image and text with word coordinates for Hyprland desktop automation.

Instructions

Take a screenshot AND run OCR, returning both the image and extracted text.

More efficient than calling screenshot + find_text_on_screen separately. The text includes screen coordinates for every detected word.

Args: monitor: Capture a specific monitor window: Capture a specific window (e.g. "class:discord") region: Capture a region as "X,Y WxH" max_width: Maximum output width for the image (default 1024) quality: JPEG quality for the image (default 60) scope: "auto" (default) captures the active window. "full" captures entire desktop.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
monitorNo
windowNo
regionNo
max_widthNo
qualityNo
scopeNoauto

Implementation Reference

  • The 'screenshot_with_ocr' tool handler function in 'hyprland_mcp/server.py'. It captures the screen (via 'screenshot.py') and performs OCR (via 'ocr.py').
    async def screenshot_with_ocr(
        monitor: str | None = None,
        window: str | None = None,
        region: str | None = None,
        max_width: int = 1024,
        quality: int = 60,
        scope: str = "auto",
    ) -> list:
        """Take a screenshot AND run OCR, returning both the image and extracted text.
    
        More efficient than calling screenshot + find_text_on_screen separately.
        The text includes screen coordinates for every detected word.
    
        Args:
            monitor: Capture a specific monitor
            window: Capture a specific window (e.g. "class:discord")
            region: Capture a region as "X,Y WxH"
            max_width: Maximum output width for the image (default 1024)
            quality: JPEG quality for the image (default 60)
            scope: "auto" (default) captures the active window. "full" captures entire desktop.
        """
        from . import screenshot as ss, ocr
    
        if scope == "full" and not (monitor or window or region):
            png_bytes, origin_x, origin_y = await ss.capture_raw()
        else:
            png_bytes, origin_x, origin_y = await _auto_scope_capture(
                monitor, window, region,
            )
    
        text = ocr.extract_text(png_bytes)
        image, _ = ss.resize_and_compress(png_bytes, max_width=max_width, quality=quality)
    
        return [image, f"OCR text:\n{text}"]
  • The tool registration using '@mcp.tool()' immediately preceding the 'screenshot_with_ocr' function definition.
    @mcp.tool()
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden. It discloses key behavioral traits: the tool performs both screenshot capture and OCR in one call, returns both image and text, includes screen coordinates for detected words, and has default values for parameters. However, it doesn't mention potential side effects, error conditions, or performance characteristics like execution time.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured and front-loaded: the first sentence states the core purpose, the second explains efficiency benefit, the third adds text detail, then parameters are clearly listed with explanations. Every sentence earns its place with no wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given 6 parameters with 0% schema coverage and no output schema, the description does an excellent job explaining inputs and basic behavior. However, it doesn't describe the output format (what the returned image and text look like structurally) or error handling. For a tool with no output schema, this leaves some ambiguity about what the agent will receive.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, so the description must compensate fully. It provides detailed semantics for all 6 parameters: explains what 'monitor', 'window', and 'region' capture, defines 'max_width' and 'quality' with defaults, and clarifies 'scope' with 'auto' vs 'full' behavior. This adds substantial meaning beyond the bare schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Take a screenshot AND run OCR, returning both the image and extracted text.' It specifies the verb ('take' and 'run'), resource ('screenshot' and 'OCR'), and distinguishes from sibling tools by noting it's 'more efficient than calling screenshot + find_text_on_screen separately.' This is specific and differentiates from alternatives.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit usage guidance: 'More efficient than calling screenshot + find_text_on_screen separately' directly compares to sibling tools. It also explains when to use specific parameters like 'scope: "auto" (default) captures the active window. "full" captures entire desktop,' giving clear context for parameter selection.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/alderban107/hyprland-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server