How do I use hermes-computer-use?

1. Click on "Install Server". 2. Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state. 3. In the chat, type @ followed by the MCP server name and your instructions, e.g., "@hermes-computer-use Open Chrome and search for the weather in Tokyo" That's it! The server will respond to your query, and you can continue using it as needed. Here is a step-by-step guide with screenshots.

hermes-computer-use

by Noah3521

Overview Schema Related Servers Score Discussions

Python

Local

hermes-computer-use

English · 日本語 · 中文 · 한국어

PyPI License: MIT Python 3.11+ Platform: WSL2 Ubuntu

Scope: Windows 11 + WSL2 Ubuntu 22.04 / 24.04 only. See docs/WSL_SETUP.md.

Pixel-level browser automation MCP server. Gives any MCP-speaking agent (hermes-agent, Claude Code, Codex, …) 21 tools to drive a real Chrome browser in an Xvfb display — screenshots as vision input, OS-level mouse/keyboard as output. No CDP. No navigator.webdriver. No DOM shortcuts.

What the GIF shows — an agent opens Chrome, focuses the Google search bar, types snp500, presses Enter, and Google returns a full SERP with the live S&P 500 index card. The same flow routinely trips "unusual traffic" or a captcha for Playwright-driven automation. This stack doesn't get flagged because the browser is stock Chrome driven by stock X11 input — there is no automation fingerprint to detect.

Why this exists

	Playwright / CDP	hermes-computer-use
`navigator.webdriver`	`true` (detectable)	`undefined`
CDP endpoint open	yes	no
DOM access	direct (fast, brittle to markup changes)	screenshot only (slower, resilient to UI rewrites)
Anti-bot footprint	large, constantly patched	near-zero: stock Chrome + stock X input
Best for	flows on sites you own	agents operating unfamiliar sites like a human

If your automation has to walk a signup funnel on a site guarded by Cloudflare, Kasada, reCAPTCHA, or DataDome, this stack usually passes where Playwright gets stopped.

Evidence: docs/assets/demo-sannysoft.png — bot.sannysoft.com fingerprint panel with WebDriver, Chrome runtime, Permissions, Plugins, Languages, and PHANTOM all passed.

Related MCP server: wraith-mcp

Architecture

agent ── stdio MCP ──▶ hermes_computer_use.server ── subprocess ──▶ xdotool / scrot
                                                                        │
                                                                        ▼
                                                                    Xvfb :99
                                                                        │
                                                     ┌──────────────────┼──────────────────┐
                                                     ▼                                     ▼
                                               x11vnc :5900                    websockify + noVNC :6080
                                          (native VNC clients)                 (browser viewer)

Longer version: docs/ARCHITECTURE.md.

Install

Prerequisites: Windows 11, WSL2 with Ubuntu 22.04/24.04, systemd enabled. Full walkthrough: docs/WSL_SETUP.md.

Everything below runs inside the WSL shell.

From PyPI

pip install "hermes-computer-use[novnc]"

You still need system packages (Xvfb, Chrome, xdotool…) and systemd units — see source install steps 1 & 4.

From source

git clone https://github.com/Noah3521/hermes-computer-use.git ~/hermes-computer-use
cd ~/hermes-computer-use

bash scripts/setup.sh                            # 1. apt + Chrome + uinput (sudo)
python3 -m venv .venv && . .venv/bin/activate
pip install -e ".[novnc]"                        # 2. Python package
bash scripts/install-novnc.sh                    # 3. (optional) web viewer

mkdir -p ~/.config/systemd/user                  # 4. persistent services
cp systemd/*.example ~/.config/systemd/user/
# edit the paths inside to match your clone, then:
sudo loginctl enable-linger "$USER"
systemctl --user daemon-reload
systemctl --user enable --now computer-use.service novnc.service

Smoke test: python examples/smoke_test.py.

Wire to an MCP client

Copy the relevant snippet from config/hermes.yaml.example into your agent's MCP server config. Works with hermes-agent, Claude Code, Codex, mcp-inspector, or any stdio MCP client.

Hand the install to an LLM

If your agent has shell + filesystem tools, you can skip the manual install entirely: paste the prompt in docs/LLM_SETUP_PROMPT.md and it will clone, install, wire up systemd, run the smoke test, and report back. Available in English, 日本語, 中文, 한국어.

Tools (30)

Category	Tools
Status	`screen_info`, `cursor_position`
Capture	`screenshot`
Pointer	`move`, `left_click`, `right_click`, `double_click`, `middle_click`, `drag`, `scroll`
Keyboard	`type_text`, `press_key`, `hold_key`, `clear_field`, `select_all`, `copy`, `paste`, `cut`, `undo`, `redo`, `clipboard_set`, `clipboard_get`
Timing	`wait`
Browser	`open_url`, `new_tab`, `close_tab`, `back`, `forward`, `reload`
Escape hatch	`run_shell`
Optional DOM fast-path (`CU_ENABLE_CDP=1`)	`dom_click`, `dom_type`, `dom_query`, `dom_exists`, `dom_wait`, `dom_eval`, `network_capture`, `console_messages`

press_key accepts case-insensitive names and aliases — Backspace, backspace, BackSpace all work; cmd+a, command-a, ctrl+a all resolve; meta/win/windows/cmd map to Super.

Opt-in DOM fast-path

For DOM-heavy pages where vision grounding is slow or fragile (SPA dashboards, deep forms), you can opt into CSS-selector-based clicks / typing / queries. Trade-off: Chrome exposes a DevTools port and navigator.webdriver flips to true for the session, which defeats the anti-bot posture on sites that fingerprint Chrome. Off by default.

CU_ENABLE_CDP=1 bash scripts/display.sh restart
pip install "hermes-computer-use[dom]"        # adds websocket-client
# Run the MCP with CU_ENABLE_CDP=1 in its env too (hermes config etc.)

See docs/ARCHITECTURE.md#dom-fast-path for when to use which.

Demo prompts

Try any of the prompts in examples/demo_prompts.md. The simplest and most illustrative:

"Use computer_use to open Google, search for snp500, and tell me the current S&P 500 index price from the page."

Open http://localhost:6080/vnc.html in a browser while the agent runs — watching the cursor arc through the search bar is surprisingly compelling.

Configuration (env vars)

Var	Default	Meaning
`CU_DISPLAY`	`99`	X display number
`CU_WIDTH` / `CU_HEIGHT`	`1440` / `900`	Virtual screen size
`CU_VNC_PORT`	`5900`	x11vnc listen port
`CU_STATE_DIR`	`/tmp/hermes-computer-use`	Logs, PID files
`CU_PROFILE_DIR`	`$CU_STATE_DIR/chrome-profile`	Persistent Chrome profile
`CU_START_URL`	`about:blank`	First URL Chrome opens
`CU_INPUT`	`xdotool`	Set to `ydotool` for `/dev/uinput` input
`CU_KEY_DELAY_MS`	`25`	Inter-keystroke delay
`CU_MOVE_STEPS`	`18`	Cursor interpolation steps

Docs

WSL_SETUP.md — Windows-side setup, systemd, linger
ARCHITECTURE.md — internals + design rationale
CAPTCHA.md — what passive / behavioural / visual challenges this approach can and cannot handle
TROUBLESHOOTING.md — common failure modes with fixes
FAQ.md — Playwright comparison, anti-bot honesty, parallel runs, profile safety
SECURITY.md — threat model and hardening checklist

Security

This is an LLM with hands. Read SECURITY.md. Baseline:

Run in an isolated WSL distro, not your daily driver.
Strip run_shell if the agent doesn't need shell access.
Don't persist real credentials in CU_PROFILE_DIR.

Contributing

See CONTRIBUTING.md. The guiding thesis is "emit no abnormal signals by default" > "emit clever evasions" — but additive hybrid paths (e.g. opt-in DOM / CDP fast-clicks that users turn on per-site) are welcome when they do not flip the default posture.

License

MIT. See LICENSE.

Acknowledgements

anthropic-quickstarts/computer-use-demo — the reference loop.
x11vnc + noVNC — observer pipeline.
Model Context Protocol — the interface.

This server cannot be installed

license - permissive license

quality - not tested

maintenance

How are these scores calculated?

Maintenance

–Maintainers

–Response time

–Release cycle

1Releases (12mo)

Commit activity

Resources

GitHub Repository

Need Help?

Related Servers

Unclaimed servers have limited discoverability.

Looking for Admin?

If you are the server author, to access and configure the admin panel.

Latest Blog Posts

Your AI Chatbot Just Exposed Your CEO's Salary to an Intern
By Om-Shree-0709 on July 2, 2026.
Agent Identity
MCP Security
OAuth Delegation
Why MCP Servers Need Execution Sandboxing (And Why Your Current Stack Isn't Enough)
By Om-Shree-0709 on June 30, 2026.
Agentic Ai
Prompt Injection
WebAssembly
Lightport: Open-Sourcing Glama's AI Gateway
By punkpeye on April 27, 2026.
OpenAI
open source

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/Noah3521/hermes-computer-use'

If you have feedback or need assistance with the MCP directory API, please join our Discord server