OmniMCP

OmniMCP 通过模型上下文协议 (MCP)和microsoft/OmniParser为 AI 模型提供丰富的 UI 上下文和交互能力，专注于通过可视化分析、结构化规划和精准交互执行，实现对用户界面的深度理解。

核心功能

**视觉感知：**使用 OmniParser 理解 UI 元素。
**LLM 规划：**根据目标、历史和视觉状态规划下一步行动。
**代理执行器：**协调感知-计划-行动循环（ omnimcp/agent_executor.py ）。
**动作执行：**通过pynput ( omnimcp/input.py ) 控制鼠标/键盘。
**CLI 接口：**运行任务的简单入口点（ cli.py ）。
**自动部署：**可选将 OmniParser 服务器部署到 AWS EC2 并自动关闭。
**调试：**每一步生成带有时间戳的可视日志。

概述

cli.py使用AgentExecutor运行感知-计划-执行循环。它捕获屏幕（ VisualState ），使用 LLM（ core.plan_action_for_ui ）进行规划，并执行操作（ InputController ）。

演示

实际动作（计算器）： python cli.py打开计算器，计算 5*9。
合成 UI（登录）： python demo_synthetic.py使用生成的图像（无实际 I/O）。 （注：待重构以使用 AgentExecutor）。

先决条件

Python >=3.10，<3.13
uv安装（ pip install uv ）
Linux 运行时要求： pynput需要一个活动的图形会话 (X11/Wayland)。可能需要系统库（ libx11-dev等）——请参阅pynput文档。

（macOS 显示缩放依赖关系在安装期间自动处理）。

对于 AWS 部署功能

需要在.env中提供 AWS 凭证（参见.env.example ）。**警告：**创建 AWS 资源（EC2、Lambda 等）会产生费用。请使用python -m omnimcp.omniparser.server stop进行清理。

AWS_ACCESS_KEY_ID=YOUR_ACCESS_KEY
AWS_SECRET_ACCESS_KEY=YOUR_SECRET_KEY
ANTHROPIC_API_KEY=YOUR_ANTHROPIC_KEY
# OMNIPARSER_URL=http://... # Optional: Skip auto-deploy

安装

git clone [https://github.com/OpenAdaptAI/OmniMCP.git](https://github.com/OpenAdaptAI/OmniMCP.git)
cd OmniMCP
./install.sh # Creates .venv, installs deps incl. test extras
cp .env.example .env
# Edit .env with your keys
# Activate: source .venv/bin/activate (Linux/macOS) or relevant Windows command

快速入门

确保环境已激活并且.env已配置。

# Run default goal (Calculator task)
python cli.py

# Run custom goal
python cli.py --goal "Your goal here"

# See options
python cli.py --help

调试输出保存在runs/<timestamp>/中。

**关于 MCP 服务器的说明：**存在一个实验性的 MCP 服务器（ omnimcp/mcp_server.py中的OmniMCP类），但与主要的cli.py / AgentExecutor工作流程是分开的。

建筑学

CLI （ cli.py ）——入口点，设置，启动执行器。
代理执行器（ omnimcp/agent_executor.py ）——协调循环，管理状态/工件。
视觉状态管理器（ omnimcp/visual_state.py ） - 感知（屏幕截图，调用解析器）。
OmniParser 客户端和部署（ omnimcp/omniparser/ ） - 管理 OmniParser 服务器通信/部署。
LLM 规划器（ omnimcp/core.py ）——生成行动计划。
输入控制器（ omnimcp/input.py ） - 执行操作（鼠标/键盘）。
（可选）MCP 服务器（ omnimcp/mcp_server.py ） - 实验性 MCP 接口。

发展

环境设置和检查

# Setup (if not done): ./install.sh
# Activate env: source .venv/bin/activate (or similar)
# Format/Lint: uv run ruff format . && uv run ruff check . --fix
# Run tests: uv run pytest tests/

调试支持

运行python cli.py会将带有时间戳的运行保存在runs/中，包括：

step_N_state_raw.png
step_N_state_parsed.png （带元素框）
step_N_action_highlight.png （带有动作突出显示）
final_state.png

详细日志位于logs/run_YYYY-MM-DD_HH-mm-ss.log中（建议在.env中使用LOG_LEVEL=DEBUG ）。

# --- Initialization & Auto-Deploy ---
2025-MM-DD HH:MM:SS | INFO     | omnimcp.omniparser.client:... - No server_url provided, attempting discovery/deployment...
2025-MM-DD HH:MM:SS | INFO     | omnimcp.omniparser.server:... - Creating new EC2 instance...
2025-MM-DD HH:MM:SS | SUCCESS  | omnimcp.omniparser.server:... - Instance i-... is running. Public IP: ...
2025-MM-DD HH:MM:SS | INFO     | omnimcp.omniparser.server:... - Setting up auto-shutdown infrastructure...
2025-MM-DD HH:MM:SS | SUCCESS  | omnimcp.omniparser.server:... - Auto-shutdown infrastructure setup completed...
... (SSH connection, Docker setup) ...
2025-MM-DD HH:MM:SS | SUCCESS  | omnimcp.omniparser.client:... - Auto-deployment successful. Server URL: http://...
... (Agent Executor Init) ...

# --- Agent Execution Loop Example Step ---
2025-MM-DD HH:MM:SS | INFO     | omnimcp.agent_executor:run:... - --- Step N/10 ---
2025-MM-DD HH:MM:SS | DEBUG    | omnimcp.agent_executor:run:... - Perceiving current screen state...
2025-MM-DD HH:MM:SS | INFO     | omnimcp.visual_state:update:... - VisualState update complete. Found X elements. Took Y.YYs.
2025-MM-DD HH:MM:SS | INFO     | omnimcp.agent_executor:run:... - Perceived state with X elements.
... (Save artifacts) ...
2025-MM-DD HH:MM:SS | DEBUG    | omnimcp.agent_executor:run:... - Planning next action...
... (LLM Call) ...
2025-MM-DD HH:MM:SS | INFO     | omnimcp.agent_executor:run:... - LLM Plan: Action=..., TargetID=..., GoalComplete=False
2025-MM-DD HH:MM:SS | DEBUG    | omnimcp.agent_executor:run:... - Added to history: Step N: Planned action ...
2025-MM-DD HH:MM:SS | INFO     | omnimcp.agent_executor:run:... - Executing action: ...
2025-MM-DD HH:MM:SS | SUCCESS  | omnimcp.agent_executor:run:... - Action executed successfully.
2025-MM-DD HH:MM:SS | DEBUG    | omnimcp.agent_executor:run:... - Step N duration: Z.ZZs
... (Loop continues or finishes) ...

（注意：时间、数量、IP、实例 ID 和具体计划等详细信息会有所不同）

路线图和局限性

主要限制和未来工作领域：

**性能：**减少 OmniParser 延迟（探索本地模型、缓存等）并优化状态管理（避免完全重新解析）。
**稳健性：**提高 LLM 规划可靠性（提示、ReAct 等技术），添加动作验证/错误恢复，增强元素定位。
**目标 API/架构：**向更高级别的声明式 API 发展（例如， @omni.publish样式），并可能将循环逻辑与实验性 MCP 服务器（ OmniMCP类）集成。
**一致性：**重构demo_synthetic.py以使用AgentExecutor 。
**特点：**扩展操作空间（拖放、悬停）。
**测试：**添加 E2E 测试，扩大跨平台验证，定义评估指标。
**研究：**探索微调、流程图（RAG）、框架集成。

项目状态

通过cli.py / AgentExecutor核心循环可以完成基本任务。性能和稳健性需要显著改进。MCP 集成尚处于实验阶段。

贡献

Fork 存储库
创建功能分支
实施变更并添加测试
确保检查通过（ uv run ruff format . , uv run ruff check . --fix , uv run pytest tests/ ）
提交拉取请求

执照

MIT 许可证

接触

问题： GitHub 问题
问题：讨论
安全： security@openadapt.ai

This server cannot be installed

security - not tested

license - not found

quality - not tested

How are these scores calculated?

hybrid server

The server is able to function both locally and remotely, depending on the configuration or use case.

为AI模型提供丰富的UI上下文和交互能力的服务器，通过可视化分析实现对用户界面的深度理解，并通过模型上下文协议实现精准的交互。

Related Resources

Reddit Discussion about this server

Related MCP Servers

MCP-PIF Server
hungryrobot1
A
security
A
license
A
quality
This server implements the Model Context Protocol to facilitate meaningful interaction and understanding development between humans and AI through structured tools and progressive interaction patterns.
Last updated -
13
51
TypeScript
MIT License
Playwright MCP Server
Kotelberg
-
security
F
license
-
quality
A server that enables AI systems to browse, retrieve content from, and interact with web pages through the Model Context Protocol.
Last updated -
SupaUI MCP Server
buoooou
A
security
F
license
A
quality
A Model Context Protocol server that enables AI agents to generate, fetch, and manage UI components through natural language interactions.
Last updated -
3
255
4
TypeScript
MCP Boilerplate
iamsrikanthnani
-
security
A
license
-
quality
A server that implements the Model Context Protocol, providing a standardized way to connect AI models to different data sources and tools.
Last updated -
2
8
TypeScript
MIT License

View all related MCP servers

OmniMCP

OmniMCP

核心功能

概述

演示

先决条件

对于 AWS 部署功能

安装

快速入门

建筑学

发展

环境设置和检查

调试支持

路线图和局限性

项目状态

贡献

执照

接触

Related Resources

Related MCP Servers

MCP-PIF Server

Playwright MCP Server

SupaUI MCP Server

MCP Boilerplate

Appeared in Searches

New MCP Servers

MCP directory API