omniparser-autogui-mcp

Integrations

  • The README mentions 'If you want langchain_example.py to work, uv sync --extra langchain instead', suggesting integration with LangChain.

omniparser-autogui-mcp

日本語版はこちら

This is an MCP server that analyzes the screen with OmniParser and automatically operates the GUI.
Confirmed on Windows.

License notes

This is MIT license, but Excluding submodules and sub packages.
OmniParser's repository is CC-BY-4.0.
Each OmniParser model has a different license (reference).

Installation

  1. Please do the following:
git clone --recursive https://github.com/NON906/omniparser-autogui-mcp.git cd omniparser-autogui-mcp uv sync set OCR_LANG=en uv run download_models.py

(Other than Windows, use export instead of set.)
(If you want langchain_example.py to work, uv sync --extra langchain instead.)

  1. Add this to your claude_desktop_config.json:
{ "mcpServers": { "omniparser_autogui_mcp": { "command": "uv", "args": [ "--directory", "D:\\CLONED_PATH\\omniparser-autogui-mcp", "run", "omniparser-autogui-mcp" ], "env": { "PYTHONIOENCODING": "utf-8", "OCR_LANG": "en" } } } }

(Replace D:\\CLONED_PATH\\omniparser-autogui-mcp with the directory you cloned.)

env allows for the following additional configurations:

  • OMNI_PARSER_BACKEND_LOAD
    If it does not work with other clients (such as LibreChat), specify 1.
  • TARGET_WINDOW_NAME
    If you want to specify the window to operate, please specify the window name.
    If not specified, operates on the entire screen.
  • OMNI_PARSER_SERVER
    If you want OmniParser processing to be done on another device, specify the server's address and port, such as 127.0.0.1:8000.
    The server can be started with uv run omniparserserver.
  • SSE_HOST, SSE_PORT
    If specified, communication will be done via SSE instead of stdio.
  • SOM_MODEL_PATH, CAPTION_MODEL_NAME, CAPTION_MODEL_PATH, OMNI_PARSER_DEVICE, BOX_TRESHOLD
    These are for OmniParser configuration.
    Usually, they are not necessary.

Usage Examples

  • Search for "MCP server" in the on-screen browser.

etc.

-
security - not tested
A
license - permissive license
-
quality - not tested

hybrid server

The server is able to function both locally and remotely, depending on the configuration or use case.

Automatic operation of on-screen GUI.

  1. License notes
    1. Installation
      1. Usage Examples

        Related MCP Servers

        • A
          security
          F
          license
          A
          quality
          Enables browser automation for LLMs on Linux display servers, supporting web interaction, screenshots, and JavaScript execution in a real browser.
          Last updated -
          7
          0
          2
          JavaScript
          • Linux
        • A
          security
          F
          license
          A
          quality
          Enables browser automation using Python scripts, offering operations like taking webpage screenshots, retrieving HTML content, and executing JavaScript.
          Last updated -
          4
          18
          Python
          • Linux
        • -
          security
          A
          license
          -
          quality
          Automates interactions with SAP GUI using the Model Context Protocol, allowing precise control of SAP transactions through tools like clicking, typing, scrolling, and transaction management.
          Last updated -
          9
          Python
          MIT License
        • -
          security
          F
          license
          -
          quality
          A companion desktop app enabling bi-directional interaction between Claude Desktop and visual UI elements, allowing Claude to display, read from, and write to interactive interfaces while processing user events and feedback.
          Last updated -
          5
          1
          TypeScript
          • Apple

        View all related MCP servers

        ID: 774hewzylm