monkeysee
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@monkeyseeNavigate to the AWS dashboard and extract the billing summary"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
π΅ MonkeySee
Monkey see, monkey do. Your agent gets eyes and hands in a real Chrome.
MonkeySee lets an MCP client (Claude Code, Codex, or anything that speaks MCP) drive a real, logged-in Chrome profile. It opens tabs, reads the page as a compact indexed list of elements, clicks, types, scrolls, takes screenshots, and decides when the job is done.
The key word is your Chrome. Not a fresh headless sandbox that gets logged out of everything. The actual browser where you are already signed into your email, your dashboard, your everything.
Demo
https://github.com/user-attachments/assets/58788fe4-6977-47d0-a47e-14f965c47e8d
Related MCP server: superpowers-chrome
This is not an agent
Let's be clear about who does the thinking. MonkeySee has no agent loop, no LLM, no "reasoning." Your terminal agent already has all of that. MonkeySee is the dumb, fast, reliable set of hands and eyeballs it has always wanted.
the brain the hands & eyes
βββββββββββββββ MCP (stdio) ββββββββββββββββββββ ws://localhost:8787 βββββββββββββ
β Claude Code β ββββββββββββββββΆβ monkeysee-bridge β βββββββββββββββββββββββΆβ extension β
β / Codex β β (dumb router) β β SW + DOM β
βββββββββββββββ ββββββββββββββββββββ βββββββββββββThe agent decides what to do. MonkeySee just does it and reports back what it saw.
What your agent can do
Once it's wired up, your agent gets a toolbox:
Look:
get_state(the page as a numbered element list, optionally with a set-of-marks screenshot),extract_text,screenshotAct on what it sees:
click,type,select_option,hover,focus(all by element index)Act by hand:
click_at,scroll,scroll_to,drag,press,type_textGet around:
open_tab,navigate,go_back,go_forward,wait_for_loadJuggle tabs:
list_tabs,switch_tab,close_tabCall it:
done(grounds the answer with the final URL + a page snippet)
What you can do with it
The point of driving your logged-in Chrome is that the agent can act on the context it already has in your terminal session and the sites you're already signed into. A few workflows that fall out of that:
Fill forms with context the agent already has. Point Claude Code or Codex at a signup, job application, expense report, or vendor onboarding form and let it populate the fields from a file, a prior conversation, or your repo. It reads the form as an indexed element list, types into each field, and tells you what it entered before submitting.
Pull data out of dashboards that have no API. Analytics, billing, internal admin panels you're logged into. The agent navigates,
extract_texts, and hands back structured notes, no scraping credentials or headless re-login required.Reproduce and triage a bug from a report. Hand it the repro steps; it clicks through your staging app, screenshots each state, and reports where the flow actually breaks.
File the boring tickets. Open Jira/Linear/GitHub, create issues from a list in your conversation, and link them back, all in the tab where you're already authenticated.
Cross-check work against a live site. Diff what your code should render against what the deployed page actually shows, with set-of-marks screenshots to point at the mismatch.
Drive multi-step web flows you'd rather not script. Cookie banners, paginated tables, multi-page wizards: the agent re-reads the page after each step instead of relying on a brittle recorded selector.
In all of these the agent is the brain and MonkeySee is the hands. You stay in the loop, the allowlist gates anything mutating, and every action reports back what it saw.
Install
One install ships both halves. There is no Chrome Web Store listing to hunt for.
npm install -g monkeysee-bridgeThen wire it up. The fastest path registers the MCP server and prints the extension path in one step:
monkeysee-bridge initThat registers monkeysee with Claude Code for every project (user scope) and prints
where the bundled extension lives. Flags: --scope project writes a repo-local
.mcp.json instead of your user config, --client codex wires up Codex
(~/.codex/config.toml) instead of Claude Code, and --print shows the config without
writing anything. See monkeysee-bridge init --help.
Prefer to wire it yourself? Register the server for every project with one command:
claude mcp add monkeysee -s user -- npx -y monkeysee-bridge...or drop this into a project's .mcp.json (it then applies only when you run Claude
Code from that folder):
{ "mcpServers": { "monkeysee": { "command": "npx", "args": ["-y", "monkeysee-bridge"] } } }Either way, one step stays manual because no installer can do it for you: teach Chrome about the bundled extension.
Open
chrome://extensionsFlip on Developer mode (top-right)
Click Load unpacked and pick the path
init(or the install) printed
Missed the path? No problem. The bridge reprints it to stderr every time it starts, so it's right there in your MCP logs.
Once the extension is loaded, confirm the link is live:
monkeysee-bridge doctorIt opens the WebSocket, waits for the extension to connect, and reports OK (with the
extension version), DOWN (extension not loaded or Chrome not running), or INCOMPATIBLE
(a protocol-version mismatch). If a bridge is already running on the port, it says so
instead of fighting for it.
The 60-second demo
The bridge is fully testable from the CLI (pnpm test, no browser). The full loop needs
Chrome:
Build:
pnpm buildLoad the extension:
chrome://extensionsβ Developer mode β Load unpacked βpackages/extension/dist. (The publishedmonkeysee-bridgeships the extension inside it, and both the installer and the bridge's startup line print the path to load.)Start Claude Code here. Copy
.mcp.json.exampleto.mcp.jsonfirst (it's git-ignored, so it stays local). That entry launches the bridge over stdio. Within a few seconds the extension's service worker connects to the bridge. Look for the green dot in the extension popup.Give it a job. Be explicit that it should use the browser, otherwise a capable agent will just answer from memory or its own web search and never touch MonkeySee. Try: "Using the monkeysee browser tools, find me a Wikipedia article about Wales." It should
open_tab,get_state,typeinto search,press('Enter')orclick, re-read,extract_text, anddonewith the URL and a snippet.
ws://localhost:8787connection refused? Totally normal when nothing is running. The bridge only listens while an MCP client has launched it. Start Claude Code here (the.mcp.jsonspawns it), or run it standalone to test the link:node packages/bridge/dist/index.js. The service worker reconnects with backoff, so order never matters. You'll seeextension connected/helloon the bridge's stderr.
Develop
pnpm install
pnpm build # build all packages
pnpm dev # watch all packages
pnpm typecheck
pnpm lint
pnpm test # bridge end-to-end check (no browser needed)Three packages in a pnpm workspace:
Package | Role |
| Shared wire types + zod schemas. The compatibility spine. Published to npm. |
| MCP server (stdio) + WebSocket server. Translates tool calls to RPC. Published to npm with a |
| MV3 Chrome extension: service-worker router + content-script eyes/hands. Bundled into the bridge, loaded unpacked. |
The deep dive lives in docs/: STRUCTURE.md (project map
design decisions).
Safety (minimal, but real)
You are pointing a robot at a browser that's logged into your life. Treat it that way.
The extension popup has a domain allowlist. With Enforce on, mutating actions
(click, type, select_option, click_at, drag, press, type_text) on a domain
that isn't allowlisted return a blocked error. Looking and navigating are never gated.
Trusted input (chrome.debugger)
By default, actions fire synthetic DOM events from the content script. Some sites are picky:
they check event.isTrusted, or shrug off a synthetic Enter in a search box.
Flip "Trusted input (chrome.debugger)" in the popup to route click, type,
type_text, press, click_at, and drag through real input dispatched via the Chrome
DevTools Protocol (Input.dispatch*). These events are trusted and behave like a real human
finger. Notes:
debuggeris a required permission (Chrome won't allow it as optional). It's only used when you turn the toggle on.If attaching fails (say DevTools is already open on that tab), MonkeySee quietly falls back to synthetic events for that one action.
The MCP tool surface is identical either way. The agent never knows or cares which backend ran.
About that yellow "started debugging this browser" banner
Chrome itself shows the "MonkeySee Browser Agent started debugging this browser" infobar
the moment any extension calls chrome.debugger.attach(). In MonkeySee that happens only
when both are true:
the Trusted input backend is on in the popup, and
a debugger-backed action actually runs (
click,type,type_text,press,click_at, ordrag). Attach is lazy and per-tab, on the first such action, and the banner sticks around until MonkeySee detaches or the tab closes.
It does not show up for:
opening a tab or navigating
observation:
get_state,extract_textscreenshot/get_state({ withScreenshot: true })(these usechrome.tabs.captureVisibleTab, no debugger required)any action while the backend is left on the default
content(synthetic events)
So: keep the backend on content and you'll never see the banner. Turn the trusted backend
on and you'll see it on each tab the first time MonkeySee dispatches input there.
Frames + screenshots
Same-origin frames are indexed automatically (
all_frames: true). Elements inside a same-origin iframe show up inget_statewithframeId != 0, are clickable by index, and report their boxes in top-viewport coordinates. Cross-origin iframes are out of scope for now.screenshotreturns the controlled tab's visible viewport as a PNG.get_state({ withScreenshot: true })adds that image with numbered set-of-marks drawn over each in-viewport element. Great for the agent to "point" with confidence.
Version compatibility
The bridge and extension swap a protocolVersion in the WebSocket hello handshake (the
contract lives in monkeysee-protocol). If their major versions disagree, the bridge
refuses the connection and never serves a tool call to a mismatched extension. The popup
shows an "incompatible bridge" status and the extension retries slowly until you update the
older side. Pre-1.0, both sides are major 0 and move in lockstep through the workspace, so
this only bites across a future breaking bump.
License
MIT. See LICENSE.
This server cannot be installed
Maintenance
Resources
Unclaimed servers have limited discoverability.
Looking for Admin?
If you are the server author, to access and configure the admin panel.
Latest Blog Posts
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/juliocesar/monkeysee'
If you have feedback or need assistance with the MCP directory API, please join our Discord server