omniparser-autogui-mcp
by NON906
Verified
# omniparser-autogui-mcp
([日本語版はこちら](README_ja.md))
This is an [MCP server](https://modelcontextprotocol.io/introduction) that analyzes the screen with [OmniParser](https://github.com/microsoft/OmniParser) and automatically operates the GUI.
Confirmed on Windows.
## License notes
This is MIT license, but Excluding submodules and sub packages.
OmniParser's repository is CC-BY-4.0.
Each OmniParser model has a different license ([reference](https://github.com/microsoft/OmniParser?tab=readme-ov-file#model-weights-license)).
## Installation
1. Please do the following:
```
git clone --recursive https://github.com/NON906/omniparser-autogui-mcp.git
cd omniparser-autogui-mcp
uv sync
set OCR_LANG=en
uv run download_models.py
```
(Other than Windows, use ``export`` instead of ``set``.)
(If you want ``langchain_example.py`` to work, ``uv sync --extra langchain`` instead.)
2. Add this to your ``claude_desktop_config.json``:
```claude_desktop_config.json
{
"mcpServers": {
"omniparser_autogui_mcp": {
"command": "uv",
"args": [
"--directory",
"D:\\CLONED_PATH\\omniparser-autogui-mcp",
"run",
"omniparser-autogui-mcp"
],
"env": {
"PYTHONIOENCODING": "utf-8",
"OCR_LANG": "en"
}
}
}
}
```
(Replace ``D:\\CLONED_PATH\\omniparser-autogui-mcp`` with the directory you cloned.)
``env`` allows for the following additional configurations:
- ``OMNI_PARSER_BACKEND_LOAD``
If it does not work with other clients (such as [LibreChat](https://github.com/danny-avila/LibreChat)), specify ``1``.
- ``TARGET_WINDOW_NAME``
If you want to specify the window to operate, please specify the window name.
If not specified, operates on the entire screen.
- ``OMNI_PARSER_SERVER``
If you want OmniParser processing to be done on another device, specify the server's address and port, such as ``127.0.0.1:8000``.
The server can be started with ``uv run omniparserserver``.
- ``SSE_HOST``, ``SSE_PORT``
If specified, communication will be done via SSE instead of stdio.
- ``SOM_MODEL_PATH``, ``CAPTION_MODEL_NAME``, ``CAPTION_MODEL_PATH``, ``OMNI_PARSER_DEVICE``, ``BOX_TRESHOLD``
These are for OmniParser configuration.
Usually, they are not necessary.
## Usage Examples
- Search for "MCP server" in the on-screen browser.
etc.