OmniMCP

License: MIT Python Version Code style: ruff

OmniMCP provides rich UI context and interaction capabilities to AI models through Model Context Protocol (MCP) and microsoft/OmniParser. It focuses on enabling deep understanding of user interfaces through visual analysis, structured planning, and precise interaction execution.

Core Features

Visual Perception: Understands UI elements using OmniParser.
LLM Planning: Plans next actions based on goal, history, and visual state.
Agent Executor: Orchestrates the perceive-plan-act loop (omnimcp/agent_executor.py).
Action Execution: Controls mouse/keyboard via pynput (omnimcp/input.py).
CLI Interface: Simple entry point (cli.py) for running tasks.
Auto-Deployment: Optional OmniParser server deployment to AWS EC2 with auto-shutdown.
Debugging: Generates timestamped visual logs per step.

Overview

cli.py uses AgentExecutor to run a perceive-plan-act loop. It captures the screen (VisualState), plans using an LLM (core.plan_action_for_ui), and executes actions (InputController).

Demos

Real Action (Calculator): python cli.py opens Calculator and computes 5*9.
Synthetic UI (Login): python demo_synthetic.py uses generated images (no real I/O). (Note: Pending refactor to use AgentExecutor).

Prerequisites

Python >=3.10, <3.13
uv installed (pip install uv)
Linux Runtime Requirement: Requires an active graphical session (X11/Wayland) for pynput. May need system libraries (libx11-dev, etc.) - see pynput docs.

(macOS display scaling dependencies are handled automatically during installation).

For AWS Deployment Features

Requires AWS credentials in .env (see .env.example). Warning: Creates AWS resources (EC2, Lambda, etc.) incurring costs. Use python -m omnimcp.omniparser.server stop to clean up.

AWS_ACCESS_KEY_ID=YOUR_ACCESS_KEY AWS_SECRET_ACCESS_KEY=YOUR_SECRET_KEY ANTHROPIC_API_KEY=YOUR_ANTHROPIC_KEY # OMNIPARSER_URL=http://... # Optional: Skip auto-deploy

Installation

git clone [https://github.com/OpenAdaptAI/OmniMCP.git](https://github.com/OpenAdaptAI/OmniMCP.git) cd OmniMCP ./install.sh # Creates .venv, installs deps incl. test extras cp .env.example .env # Edit .env with your keys # Activate: source .venv/bin/activate (Linux/macOS) or relevant Windows command

Quick Start

Ensure environment is activated and .env is configured.

# Run default goal (Calculator task) python cli.py # Run custom goal python cli.py --goal "Your goal here" # See options python cli.py --help

Debug outputs are saved in runs/<timestamp>/.

Note on MCP Server: An experimental MCP server (OmniMCP class in omnimcp/mcp_server.py) exists but is separate from the primary cli.py/AgentExecutor workflow.

Architecture

CLI (cli.py) - Entry point, setup, starts Executor.
Agent Executor (omnimcp/agent_executor.py) - Orchestrates loop, manages state/artifacts.
Visual State Manager (omnimcp/visual_state.py) - Perception (screenshot, calls parser).
OmniParser Client & Deploy (omnimcp/omniparser/) - Manages OmniParser server communication/deployment.
LLM Planner (omnimcp/core.py) - Generates action plan.
Input Controller (omnimcp/input.py) - Executes actions (mouse/keyboard).
(Optional) MCP Server (omnimcp/mcp_server.py) - Experimental MCP interface.

Development

Environment Setup & Checks

# Setup (if not done): ./install.sh # Activate env: source .venv/bin/activate (or similar) # Format/Lint: uv run ruff format . && uv run ruff check . --fix # Run tests: uv run pytest tests/

Debug Support

Running python cli.py saves timestamped runs in runs/, including:

step_N_state_raw.png
step_N_state_parsed.png (with element boxes)
step_N_action_highlight.png (with action highlight)
final_state.png

Detailed logs are in logs/run_YYYY-MM-DD_HH-mm-ss.log (LOG_LEVEL=DEBUG in .env recommended).

# --- Initialization & Auto-Deploy --- 2025-MM-DD HH:MM:SS | INFO | omnimcp.omniparser.client:... - No server_url provided, attempting discovery/deployment... 2025-MM-DD HH:MM:SS | INFO | omnimcp.omniparser.server:... - Creating new EC2 instance... 2025-MM-DD HH:MM:SS | SUCCESS | omnimcp.omniparser.server:... - Instance i-... is running. Public IP: ... 2025-MM-DD HH:MM:SS | INFO | omnimcp.omniparser.server:... - Setting up auto-shutdown infrastructure... 2025-MM-DD HH:MM:SS | SUCCESS | omnimcp.omniparser.server:... - Auto-shutdown infrastructure setup completed... ... (SSH connection, Docker setup) ... 2025-MM-DD HH:MM:SS | SUCCESS | omnimcp.omniparser.client:... - Auto-deployment successful. Server URL: http://... ... (Agent Executor Init) ... # --- Agent Execution Loop Example Step --- 2025-MM-DD HH:MM:SS | INFO | omnimcp.agent_executor:run:... - --- Step N/10 --- 2025-MM-DD HH:MM:SS | DEBUG | omnimcp.agent_executor:run:... - Perceiving current screen state... 2025-MM-DD HH:MM:SS | INFO | omnimcp.visual_state:update:... - VisualState update complete. Found X elements. Took Y.YYs. 2025-MM-DD HH:MM:SS | INFO | omnimcp.agent_executor:run:... - Perceived state with X elements. ... (Save artifacts) ... 2025-MM-DD HH:MM:SS | DEBUG | omnimcp.agent_executor:run:... - Planning next action... ... (LLM Call) ... 2025-MM-DD HH:MM:SS | INFO | omnimcp.agent_executor:run:... - LLM Plan: Action=..., TargetID=..., GoalComplete=False 2025-MM-DD HH:MM:SS | DEBUG | omnimcp.agent_executor:run:... - Added to history: Step N: Planned action ... 2025-MM-DD HH:MM:SS | INFO | omnimcp.agent_executor:run:... - Executing action: ... 2025-MM-DD HH:MM:SS | SUCCESS | omnimcp.agent_executor:run:... - Action executed successfully. 2025-MM-DD HH:MM:SS | DEBUG | omnimcp.agent_executor:run:... - Step N duration: Z.ZZs ... (Loop continues or finishes) ...

(Note: Details like timings, counts, IPs, instance IDs, and specific plans will vary)

Roadmap & Limitations

Key limitations & future work areas:

Performance: Reduce OmniParser latency (explore local models, caching, etc.) and optimize state management (avoid full re-parse).
Robustness: Improve LLM planning reliability (prompts, techniques like ReAct), add action verification/error recovery, enhance element targeting.
Target API/Architecture: Evolve towards a higher-level declarative API (e.g., @omni.publish style) and potentially integrate loop logic with the experimental MCP Server (OmniMCP class).
Consistency: Refactor demo_synthetic.py to use AgentExecutor.
Features: Expand action space (drag/drop, hover).
Testing: Add E2E tests, broaden cross-platform validation, define evaluation metrics.
Research: Explore fine-tuning, process graphs (RAG), framework integration.

Project Status

Core loop via cli.py/AgentExecutor is functional for basic tasks. Performance and robustness need significant improvement. MCP integration is experimental.

Contributing

Fork repository
Create feature branch
Implement changes & add tests
Ensure checks pass (uv run ruff format ., uv run ruff check . --fix, uv run pytest tests/)
Submit pull request

License

MIT License

Contact

Issues: GitHub Issues
Questions: Discussions
Security: security@openadapt.ai

This server cannot be installed

security - not tested

license - not found

quality - not tested

How are these scores calculated?

hybrid server

The server is able to function both locally and remotely, depending on the configuration or use case.

A server that provides rich UI context and interaction capabilities to AI models, enabling deep understanding of user interfaces through visual analysis and precise interaction via Model Context Protocol.

Related Resources

Reddit Discussion about this server

Related MCP Servers

MCP-PIF Server
hungryrobot1
A
security
A
license
A
quality
This server implements the Model Context Protocol to facilitate meaningful interaction and understanding development between humans and AI through structured tools and progressive interaction patterns.
Last updated -
54
MIT License
Playwright MCP Server
Kotelberg
-
security
F
license
-
quality
A server that enables AI systems to browse, retrieve content from, and interact with web pages through the Model Context Protocol.
Last updated -
SupaUI MCP Server
buoooou
A
security
F
license
A
quality
A Model Context Protocol server that enables AI agents to generate, fetch, and manage UI components through natural language interactions.
Last updated -
3
21
6
MCP Boilerplate
iamsrikanthnani
-
security
A
license
-
quality
A server that implements the Model Context Protocol, providing a standardized way to connect AI models to different data sources and tools.
Last updated -
0
10
MIT License

View all related MCP servers

OmniMCP

OmniMCP

Core Features

Overview

Demos

Prerequisites

For AWS Deployment Features

Installation

Quick Start

Architecture

Development

Environment Setup & Checks

Debug Support

Roadmap & Limitations

Project Status

Contributing

License

Contact

Related Resources

Related MCP Servers

MCP-PIF Server

Playwright MCP Server

SupaUI MCP Server

MCP Boilerplate

Appeared in Searches

New MCP Servers

MCP directory API