Integrations
The README mentions 'If you want langchain_example.py to work, uv sync --extra langchain instead', suggesting integration with LangChain.
omniparser-autogui-mcp
(日本語版はこちら)
This is an MCP server that analyzes the screen with OmniParser and automatically operates the GUI.
Confirmed on Windows.
License notes
This is MIT license, but Excluding submodules and sub packages.
OmniParser's repository is CC-BY-4.0.
Each OmniParser model has a different license (reference).
Installation
- Please do the following:
(Other than Windows, use export
instead of set
.)
(If you want langchain_example.py
to work, uv sync --extra langchain
instead.)
- Add this to your
claude_desktop_config.json
:
(Replace D:\\CLONED_PATH\\omniparser-autogui-mcp
with the directory you cloned.)
env
allows for the following additional configurations:
OMNI_PARSER_BACKEND_LOAD
If it does not work with other clients (such as LibreChat), specify1
.TARGET_WINDOW_NAME
If you want to specify the window to operate, please specify the window name.
If not specified, operates on the entire screen.OMNI_PARSER_SERVER
If you want OmniParser processing to be done on another device, specify the server's address and port, such as127.0.0.1:8000
.
The server can be started withuv run omniparserserver
.SSE_HOST
,SSE_PORT
If specified, communication will be done via SSE instead of stdio.SOM_MODEL_PATH
,CAPTION_MODEL_NAME
,CAPTION_MODEL_PATH
,OMNI_PARSER_DEVICE
,BOX_TRESHOLD
These are for OmniParser configuration.
Usually, they are not necessary.
Usage Examples
- Search for "MCP server" in the on-screen browser.
etc.
This server cannot be installed
hybrid server
The server is able to function both locally and remotely, depending on the configuration or use case.
Automatic operation of on-screen GUI.
Related Resources
Related MCP Servers
- AsecurityFlicenseAqualityEnables browser automation for LLMs on Linux display servers, supporting web interaction, screenshots, and JavaScript execution in a real browser.Last updated -702JavaScript
- AsecurityFlicenseAqualityEnables browser automation using Python scripts, offering operations like taking webpage screenshots, retrieving HTML content, and executing JavaScript.Last updated -418Python
- -securityAlicense-qualityAutomates interactions with SAP GUI using the Model Context Protocol, allowing precise control of SAP transactions through tools like clicking, typing, scrolling, and transaction management.Last updated -9PythonMIT License
- -securityFlicense-qualityA companion desktop app enabling bi-directional interaction between Claude Desktop and visual UI elements, allowing Claude to display, read from, and write to interactive interfaces while processing user events and feedback.Last updated -51TypeScript