Skip to main content
Glama
alderban107

hyprland-mcp

by alderban107

type_into

Automatically locate text input fields using OCR, click to focus, type specified text, and optionally submit with Enter for desktop automation in Hyprland environments.

Instructions

Find a text input field, click it, type text, and optionally submit.

Combines focus + OCR + click + type + Enter into one action. Searches for placeholder text in the active window to find the input field.

Args: text: The text to type input_hint: Placeholder or label text near the input field to click on (e.g. "Type a message", "Search", "Message"). If omitted, tries common placeholders. submit: Whether to press Enter after typing (default False) window: Target a specific window (e.g. "class:signal"). Default: active window.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
textYes
input_hintNo
submitNo
windowNo

Output Schema

TableJSON Schema
NameRequiredDescriptionDefault
resultYes

Implementation Reference

  • Handler function for the 'type_into' MCP tool. It uses OCR to locate an input field, clicks it, types the provided text, and optionally submits.
    async def type_into(
        text: str,
        input_hint: str | None = None,
        submit: bool = False,
        window: str | None = None,
    ) -> str:
        """Find a text input field, click it, type text, and optionally submit.
    
        Combines focus + OCR + click + type + Enter into one action. Searches for
        placeholder text in the active window to find the input field.
    
        Args:
            text: The text to type
            input_hint: Placeholder or label text near the input field to click on
                        (e.g. "Type a message", "Search", "Message"). If omitted,
                        tries common placeholders.
            submit: Whether to press Enter after typing (default False)
            window: Target a specific window (e.g. "class:signal"). Default: active window.
        """
        from . import screenshot as ss, ocr, input as inp
        import asyncio
    
        # Focus the window if specified
        if window:
            await hyprctl.dispatch("focuswindow", window)
            await asyncio.sleep(0.1)
    
        png_bytes, origin_x, origin_y = await _auto_scope_capture(
            None, window, None,
        )
    
        boxes = ocr.extract_boxes(png_bytes)
    
        # Try to find the input field by hint text or common placeholders
        hints = [input_hint] if input_hint else [
            "Type a message",
            "Message",
            "Search",
            "Type here",
            "Write a message",
            "Enter message",
            "Say something",
        ]
    
        match = None
        used_hint = None
        for hint in hints:
            matches = ocr.find_text(boxes, hint)
            if matches:
                match = matches[0]
                used_hint = hint
                break
    
        if not match:
            all_text = ocr.extract_text(png_bytes)
            preview = all_text[:500] + "..." if len(all_text) > 500 else all_text
            tried = ", ".join(f'"{h}"' for h in hints)
            return (
                f"Could not find input field. Tried: {tried}\n\n"
                f"OCR detected text:\n{preview}"
            )
    
        # Click the center of the matched text
        screen_x = match["x"] + origin_x + match["w"] // 2
        screen_y = match["y"] + origin_y + match["h"] // 2
    
        await inp.move_cursor(screen_x, screen_y)
        await inp.click("left")
        await asyncio.sleep(0.1)
    
        # Type the text
        await inp.type_text(text)
    
        result = f"Found '{used_hint}' at ({screen_x},{screen_y}), typed {len(text)} chars"
    
        # Submit if requested
        if submit:
            await asyncio.sleep(0.05)
            await hyprctl.dispatch("sendshortcut", ", Return, ")
            result += ", pressed Enter"
    
        return result
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries full burden and does well by disclosing key behavioral traits: it performs OCR to find fields, searches the active window by default, tries common placeholders if input_hint is omitted, and includes a submit option. It doesn't mention error handling, performance characteristics, or platform dependencies, but covers core functionality adequately.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is appropriately sized and front-loaded: the first sentence states the core purpose, followed by key behavioral context, then a well-structured parameter section. Every sentence adds value with zero waste, making it easy to scan and understand.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's moderate complexity (4 parameters, no annotations, but has output schema), the description is mostly complete. It explains what the tool does, when to use it, and all parameters. Since an output schema exists, it doesn't need to explain return values. Minor gaps include lack of error cases or performance notes, but it's sufficient for effective use.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, so the description must fully compensate. It successfully adds meaning for all 4 parameters: explains 'text' is the text to type, 'input_hint' is placeholder/label text for finding the field with examples, 'submit' controls Enter press with default, and 'window' targets specific windows with an example. This goes well beyond the bare schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose with specific verbs (find, click, type, submit) and resources (text input field). It distinguishes from siblings like 'type_text' by explaining it combines multiple actions (focus + OCR + click + type + Enter) and searches for placeholder text, making the scope and differentiation explicit.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context for when to use this tool (to type text into an input field with optional submission) and implies alternatives by mentioning it 'combines' multiple actions, suggesting it could replace separate tools like 'click_text' + 'type_text'. However, it lacks explicit when-not-to-use guidance or named alternatives from the sibling list.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/alderban107/hyprland-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server