オムニパーサー-autogui-mcp

OmniParserで画面を解析し、GUIを自動操作するMCPサーバーです。
Windowsで確認済み。

ライセンスに関する注意事項

これは MIT ライセンスですが、サブモジュールとサブパッケージは除きます。
OmniParser のリポジトリは CC-BY-4.0 です。
各 OmniParser モデルには異なるライセンスがあります (参照)。

インストール

次の手順に従ってください。

git clone --recursive https://github.com/NON906/omniparser-autogui-mcp.git
cd omniparser-autogui-mcp
uv sync
set OCR_LANG=en
uv run download_models.py

(Windows 以外の場合は、 setの代わりにexportを使用します。)
( langchain_example.pyを動作させたい場合は、代わりにuv sync --extra langchain実行してください。)

これをclaude_desktop_config.jsonに追加します:

{
  "mcpServers": {
    "omniparser_autogui_mcp": {
      "command": "uv",
      "args": [
        "--directory",
        "D:\\CLONED_PATH\\omniparser-autogui-mcp",
        "run",
        "omniparser-autogui-mcp"
      ],
      "env": {
        "PYTHONIOENCODING": "utf-8",
        "OCR_LANG": "en"
      }
    }
  }
}

( D:\\CLONED_PATH\\omniparser-autogui-mcpをクローンしたディレクトリに置き換えます。)

env次の追加構成が可能です。

OMNI_PARSER_BACKEND_LOAD
他のクライアント（ LibreChatなど）で動作しない場合は、 1指定します。
TARGET_WINDOW_NAME
操作するウィンドウを指定する場合は、ウィンドウ名を指定してください。
指定しない場合は画面全体で動作します。
OMNI_PARSER_SERVER
OmniParser 処理を別のデバイスで実行する場合は、 127.0.0.1:8000などのサーバーのアドレスとポートを指定します。
サーバーはuv run omniparserserverで起動できます。
SSE_HOST 、 SSE_PORT
指定すると、通信はstdioではなくSSE経由で行われます。
SOM_MODEL_PATH 、 CAPTION_MODEL_NAME 、 CAPTION_MODEL_PATH 、 OMNI_PARSER_DEVICE 、 BOX_TRESHOLD
これらは OmniParser 構成用です。
通常、それらは必要ありません。

使用例

画面上のブラウザで「MCP サーバー」を検索します。

等

This server cannot be installed

security - not tested

license - permissive license

quality - not tested

How are these scores calculated?

hybrid server

The server is able to function both locally and remotely, depending on the configuration or use case.

オンスクリーンGUIの自動操作。

Related Resources

Reddit Discussion about this server

Related MCP Servers

MCP Puppeteer Linux Server
PhialsBasement
A
security
F
license
A
quality
Enables browser automation for LLMs on Linux display servers, supporting web interaction, screenshots, and JavaScript execution in a real browser.
Last updated -
7
5
4
JavaScript
Browser Use Server
ztobs
A
security
F
license
A
quality
Enables browser automation using Python scripts, offering operations like taking webpage screenshots, retrieving HTML content, and executing JavaScript.
Last updated -
4
18
Python
MCP SAP GUI Server
mario-andreschak
-
security
A
license
-
quality
Automates interactions with SAP GUI using the Model Context Protocol, allowing precise control of SAP transactions through tools like clicking, typing, scrolling, and transaction management.
Last updated -
9
Python
MIT License
PopUI
kelnishi
-
security
F
license
-
quality
A companion desktop app enabling bi-directional interaction between Claude Desktop and visual UI elements, allowing Claude to display, read from, and write to interactive interfaces while processing user events and feedback.
Last updated -
5
1
TypeScript

View all related MCP servers

omniparser-autogui-mcp