옴니MCP

OmniMCP는 Model Context Protocol(MCP) 과 microsoft/OmniParser를 통해 AI 모델에 풍부한 UI 컨텍스트 및 상호작용 기능을 제공합니다. 시각적 분석, 체계적인 계획, 그리고 정밀한 상호작용 실행을 통해 사용자 인터페이스에 대한 심층적인 이해를 지원하는 데 중점을 둡니다.

핵심 기능

시각적 인식: OmniParser를 사용하여 UI 요소를 이해합니다.
LLM 계획: 목표, 기록, 시각적 상태를 기반으로 다음 작업을 계획합니다.
에이전트 실행자: 인식-계획-행동 루프를 조율합니다( omnimcp/agent_executor.py ).
동작 실행: pynput ( omnimcp/input.py )을 통해 마우스/키보드를 제어합니다.
CLI 인터페이스: 작업 실행을 위한 간단한 진입점( cli.py ).
자동 배포: 자동 종료 기능을 갖춘 AWS EC2에 대한 선택적 OmniParser 서버 배포.
디버깅: 단계별로 타임스탬프가 포함된 시각적 로그를 생성합니다.

개요

cli.py AgentExecutor 사용하여 인식-계획-행동 루프를 실행합니다. 화면( VisualState )을 캡처하고, LLM( core.plan_action_for_ui )을 사용하여 계획을 수립하고, 동작( InputController )을 실행합니다.

시민

실제 작업(계산기): python cli.py 계산기를 열고 5*9를 계산합니다.
합성 UI(로그인): python demo_synthetic.py 생성된 이미지를 사용합니다(실제 I/O 없음). (참고: AgentExecutor를 사용하기 위한 리팩토링이 진행 중입니다).

필수 조건

파이썬 3.10 이상, 3.13 미만
uv 설치됨 ( pip install uv )
Linux 런타임 요구 사항: pynput 사용하려면 활성 그래픽 세션(X11/Wayland)이 필요합니다. 시스템 라이브러리( libx11-dev 등)가 필요할 수 있습니다. pynput 문서를 참조하세요.

(macOS 디스플레이 크기 조정 종속성은 설치 중에 자동으로 처리됩니다).

AWS 배포 기능

.env 에서 AWS 자격 증명이 필요합니다( .env.example 참조). 경고: AWS 리소스(EC2, Lambda 등)가 생성되어 비용이 발생합니다. python -m omnimcp.omniparser.server stop 사용하여 리소스를 정리하세요.

지엑스피1

설치

git clone [https://github.com/OpenAdaptAI/OmniMCP.git](https://github.com/OpenAdaptAI/OmniMCP.git)
cd OmniMCP
./install.sh # Creates .venv, installs deps incl. test extras
cp .env.example .env
# Edit .env with your keys
# Activate: source .venv/bin/activate (Linux/macOS) or relevant Windows command

빠른 시작

환경이 활성화되어 있고 .env 구성되어 있는지 확인하세요.

# Run default goal (Calculator task)
python cli.py

# Run custom goal
python cli.py --goal "Your goal here"

# See options
python cli.py --help

디버그 출력은 runs/<timestamp>/ 에 저장됩니다.

MCP 서버에 대한 참고 사항: 실험적 MCP 서버( omnimcp/mcp_server.py 의 OmniMCP 클래스)가 있지만 기본 cli.py / AgentExecutor 워크플로와 별개입니다.

건축학

CLI ( cli.py ) - 진입점, 설정, Executor 시작.
에이전트 실행자 ( omnimcp/agent_executor.py ) - 루프를 조정하고 상태/아티팩트를 관리합니다.
시각적 상태 관리자 ( omnimcp/visual_state.py ) - 인식(스크린샷, 파서 호출).
OmniParser 클라이언트 및 배포 ( omnimcp/omniparser/ ) - OmniParser 서버 통신/배포를 관리합니다.
LLM Planner ( omnimcp/core.py ) - 작업 계획을 생성합니다.
입력 컨트롤러 ( omnimcp/input.py ) - 동작(마우스/키보드)을 실행합니다.
(선택 사항) MCP 서버 ( omnimcp/mcp_server.py ) - 실험적 MCP 인터페이스.

개발

환경 설정 및 확인

# Setup (if not done): ./install.sh
# Activate env: source .venv/bin/activate (or similar)
# Format/Lint: uv run ruff format . && uv run ruff check . --fix
# Run tests: uv run pytest tests/

디버그 지원

python cli.py 실행하면 다음을 포함하여 타임스탬프가 지정된 실행이 runs/ 에 저장됩니다.

step_N_state_raw.png
step_N_state_parsed.png (요소 상자 포함)
step_N_action_highlight.png (액션 강조 표시 포함)
final_state.png

자세한 로그는 logs/run_YYYY-MM-DD_HH-mm-ss.log 에 있습니다( .env 에서 LOG_LEVEL=DEBUG 권장).

# --- Initialization & Auto-Deploy ---
2025-MM-DD HH:MM:SS | INFO     | omnimcp.omniparser.client:... - No server_url provided, attempting discovery/deployment...
2025-MM-DD HH:MM:SS | INFO     | omnimcp.omniparser.server:... - Creating new EC2 instance...
2025-MM-DD HH:MM:SS | SUCCESS  | omnimcp.omniparser.server:... - Instance i-... is running. Public IP: ...
2025-MM-DD HH:MM:SS | INFO     | omnimcp.omniparser.server:... - Setting up auto-shutdown infrastructure...
2025-MM-DD HH:MM:SS | SUCCESS  | omnimcp.omniparser.server:... - Auto-shutdown infrastructure setup completed...
... (SSH connection, Docker setup) ...
2025-MM-DD HH:MM:SS | SUCCESS  | omnimcp.omniparser.client:... - Auto-deployment successful. Server URL: http://...
... (Agent Executor Init) ...

# --- Agent Execution Loop Example Step ---
2025-MM-DD HH:MM:SS | INFO     | omnimcp.agent_executor:run:... - --- Step N/10 ---
2025-MM-DD HH:MM:SS | DEBUG    | omnimcp.agent_executor:run:... - Perceiving current screen state...
2025-MM-DD HH:MM:SS | INFO     | omnimcp.visual_state:update:... - VisualState update complete. Found X elements. Took Y.YYs.
2025-MM-DD HH:MM:SS | INFO     | omnimcp.agent_executor:run:... - Perceived state with X elements.
... (Save artifacts) ...
2025-MM-DD HH:MM:SS | DEBUG    | omnimcp.agent_executor:run:... - Planning next action...
... (LLM Call) ...
2025-MM-DD HH:MM:SS | INFO     | omnimcp.agent_executor:run:... - LLM Plan: Action=..., TargetID=..., GoalComplete=False
2025-MM-DD HH:MM:SS | DEBUG    | omnimcp.agent_executor:run:... - Added to history: Step N: Planned action ...
2025-MM-DD HH:MM:SS | INFO     | omnimcp.agent_executor:run:... - Executing action: ...
2025-MM-DD HH:MM:SS | SUCCESS  | omnimcp.agent_executor:run:... - Action executed successfully.
2025-MM-DD HH:MM:SS | DEBUG    | omnimcp.agent_executor:run:... - Step N duration: Z.ZZs
... (Loop continues or finishes) ...

(참고: 타이밍, 개수, IP, 인스턴스 ID, 특정 계획과 같은 세부 사항은 다를 수 있습니다)

로드맵 및 제한 사항

주요 제한 사항 및 향후 작업 영역:

성능: OmniParser 지연 시간을 줄이고(로컬 모델 탐색, 캐싱 등) 상태 관리를 최적화합니다(전체 재구문을 방지).
견고성: LLM 계획의 안정성(프롬프트, ReAct와 같은 기술)을 개선하고, 작업 검증/오류 복구를 추가하고, 요소 타겟팅을 강화합니다.
대상 API/아키텍처: 더 높은 수준의 선언적 API(예: @omni.publish 스타일)로 발전하고 잠재적으로 루프 논리를 실험적 MCP 서버( OmniMCP 클래스)와 통합합니다.
일관성: AgentExecutor 사용하도록 demo_synthetic.py 리팩토링합니다.
특징: 작업 공간 확장(드래그/드롭, 호버)
테스트: E2E 테스트를 추가하고, 플랫폼 간 검증을 확대하고, 평가 지표를 정의합니다.
연구: 미세 조정, 프로세스 그래프(RAG), 프레임워크 통합을 탐구합니다.

프로젝트 상태

cli.py AgentExecutor 통한 코어 루프는 기본 작업에 적합합니다. 성능과 견고성은 크게 개선되어야 합니다. MCP 통합은 실험 중입니다.

기여하다

포크 저장소
기능 브랜치 생성
변경 사항 구현 및 테스트 추가
검사 통과 보장( uv run ruff format . , uv run ruff check . --fix , uv run pytest tests/ )
풀 리퀘스트 제출

특허

MIT 라이센스

연락하다

This server cannot be installed

security - not tested

license - not found

quality - not tested

How are these scores calculated?

hybrid server

The server is able to function both locally and remotely, depending on the configuration or use case.

AI 모델에 풍부한 UI 컨텍스트와 상호작용 기능을 제공하는 서버로, 시각적 분석을 통해 사용자 인터페이스를 심층적으로 이해하고 모델 컨텍스트 프로토콜을 통해 정확한 상호작용을 구현할 수 있습니다.

Related Resources

Reddit Discussion about this server

Related MCP Servers

MCP-PIF Server
hungryrobot1
A
security
A
license
A
quality
This server implements the Model Context Protocol to facilitate meaningful interaction and understanding development between humans and AI through structured tools and progressive interaction patterns.
Last updated -
13
51
TypeScript
MIT License
Playwright MCP Server
Kotelberg
-
security
F
license
-
quality
A server that enables AI systems to browse, retrieve content from, and interact with web pages through the Model Context Protocol.
Last updated -
SupaUI MCP Server
buoooou
A
security
F
license
A
quality
A Model Context Protocol server that enables AI agents to generate, fetch, and manage UI components through natural language interactions.
Last updated -
3
255
4
TypeScript
MCP Boilerplate
iamsrikanthnani
-
security
A
license
-
quality
A server that implements the Model Context Protocol, providing a standardized way to connect AI models to different data sources and tools.
Last updated -
2
8
TypeScript
MIT License

View all related MCP servers

OmniMCP

옴니MCP

핵심 기능

개요

시민

필수 조건

AWS 배포 기능

설치

빠른 시작

건축학

개발

환경 설정 및 확인

디버그 지원

로드맵 및 제한 사항

프로젝트 상태

기여하다

특허

연락하다

Related Resources

Related MCP Servers

MCP-PIF Server

Playwright MCP Server

SupaUI MCP Server

MCP Boilerplate

Appeared in Searches

New MCP Servers

MCP directory API