DOMShell
| |
___|_|___
|___|_|___|
| | | |
|___|_|___|
/ | | \
/ | | \
|____|_|____|
| |
| DOMSHELL |
| |
|___________|
|###########|
|###########|
\#########/
\_______/
██ ██ ██ ███████
██ ██ ██ ███
███████ ██ ██
██░░░██ ██ ██
██ ██ ██ ██
░░ ░░ ░░ ░░
███████ ██ ██ ███████
███ ███████ ██░░░░░
███ ██░░░██ █████
███ ██ ██ ██░░░
███ ██ ██ ███████
░░░ ░░ ░░ ░░░░░░░
██████ ██████ ███ ███ ██
██ ██ ██ ██ ████ ████ ██
██ ██ ██ ██ ██ ████ ██ ██
██ ██ ██ ██ ██ ██ ██ ░░
██████ ██████ ██ ██ ██
░░░░░░ ░░░░░░ ░░ ░░ ░░The browser is your filesystem. A Chrome Extension that lets AI agents (and humans) browse the web using standard Linux commands — ls, cd, cat, grep, click — via a terminal in the Chrome Side Panel.
Install from Chrome Web Store | npm package | Read the blog post | Project home
DOMShell maps the browser into a virtual filesystem. Windows and tabs become top-level directories (~). Each tab's Accessibility Tree becomes a nested filesystem where container elements are directories and buttons, links, and inputs are files. Navigate Chrome the same way you'd navigate /usr/local/bin.
Why
AI agents that interact with websites typically rely on screenshots, pixel coordinates, or brittle CSS selectors. DOMShell takes a different approach: it exposes the browser's own Accessibility Tree as a familiar filesystem metaphor.
This means an agent can:
Browse tabs with
ls ~/tabs/and switch withcd ~/tabs/123instead of guessing which tab is activeExplore a page with
lsandtreeinstead of parsing screenshotsNavigate into sections with
cd navigation/instead of guessing coordinatesAct on elements with
click submit_btninstead of fragile DOM queriesRead content with
cator bulk-extract withtextinstead of scraping innerHTMLSearch for elements with
find --type comboboxinstead of writing selectors
The filesystem abstraction is deterministic, semantic, and works on any website — no site-specific adapters needed.
Installation
Chrome Web Store (Recommended)
Install DOMShell directly from the Chrome Web Store. No build step required.
From Source
git clone https://github.com/apireno/DOMShell.git
cd DOMShell
npm install
npm run buildLoad into Chrome
Open
chrome://extensions/Enable Developer mode (toggle in top right)
Click Load unpacked
Select the
dist/folderClick the DOMShell icon in your toolbar — the side panel opens
Usage
Getting Started
Open any webpage, then open the DOMShell side panel. You'll see a terminal:
╔══════════════════════════════════════╗
║ DOMShell v1.1.0 ║
║ The browser is your filesystem. ║
╚══════════════════════════════════════╝
Type 'help' to see available commands.
Type 'tabs' to see open browser tabs, then 'cd tabs/<id>' to enter one.
dom@shell:~$You start at ~ (the browser root). Jump straight to the active tab with here, or explore:
dom@shell:~$ ls
windows/ (2 windows)
tabs/ (5 tabs)
dom@shell:~$ here
✓ Entered tab 123
Title: Google
URL: https://google.com
AX Nodes: 247Browsing Tabs and Windows
# List all open tabs
dom@shell:~$ tabs
ID TITLE URL WIN
123 Google google.com 1
124 GitHub - apireno github.com/apireno 1
125 Wikipedia en.wikipedia.org 2
# Switch to a tab by ID
dom@shell:~$ cd tabs/125
✓ Entered tab 125
Title: Wikipedia
URL: https://en.wikipedia.org
AX Nodes: 312
# You're now inside the tab's DOM tree
dom@shell:~$ pwd
~/tabs/125
# Go back to browser level
dom@shell:~$ cd ~
dom@shell:~$
# Or use substring matching
dom@shell:~$ cd tabs/github
✓ Entered tab 124 (GitHub - apireno)
# List windows (shows tabs grouped under each window)
dom@shell:~$ windows
Window 1 (focused)
├── *123 Google google.com
├── 124 GitHub - apireno github.com/apireno
└── 125 Wikipedia en.wikipedia.org
Window 2
├── *126 Stack Overflow stackoverflow.com
└── 127 MDN Web Docs developer.mozilla.org
# Browse a specific window's tabs
dom@shell:~$ cd windows/2
dom@shell:~/windows/2$ ls
ID TITLE URL
125 Wikipedia en.wikipedia.org
126 LinkedIn linkedin.comYou can also navigate or open new tabs:
# Navigate the current tab to a URL (requires being inside a tab)
dom@shell:~$ navigate https://example.com
# Open a URL in a new tab (works from anywhere)
dom@shell:~$ open https://github.com
✓ Opened new tab
URL: https://github.com
Title: GitHub
AX Nodes: 412Navigating the DOM
Once you're inside a tab, the Accessibility Tree appears as a filesystem:
# List children of the current node
dom@shell:~$ ls
navigation/
main/
complementary/
contentinfo/
skip_to_content_link
logo_link
# Long format shows type prefixes and roles
dom@shell:~$ ls -l
[d] navigation navigation/
[d] main main/
[x] link skip_to_content_link
[x] link logo_link
# Filter by type
dom@shell:~$ ls --type link
skip_to_content_link
logo_link
# Show DOM metadata (href, src, id) inline — great for finding URLs
dom@shell:~$ ls --meta --type link
[x] link skip_to_content_link href=https://example.com/#content <a>
[x] link logo_link href=https://example.com/ <a>
# Paginate large directories
dom@shell:~$ ls -n 10 # First 10 items
dom@shell:~$ ls -n 10 --offset 10 # Items 11-20
# Count children by type
dom@shell:~$ ls --count
45 total (12 [d], 28 [x], 5 [-])
# Enter a directory (container element)
dom@shell:~$ cd navigation
# See where you are
dom@shell:~$ pwd
~/tabs/125/navigation
# Go back up
dom@shell:~$ cd ..
# Jump to browser root
dom@shell:~$ cd ~
# Multi-level paths work too
dom@shell:~$ cd main/article/form
# Path variable: %here% expands to the focused tab (via its window)
dom@shell:~$ cd %here% # Enter the active tab
dom@shell:~$ cd %here%/.. # Go to the window containing the active tab
dom@shell:~$ cd %here%/main # Enter the active tab and cd into mainType Prefixes
Every node has a type prefix that communicates metadata without relying on color alone:
Prefix | Meaning | Examples |
| Directory (container, |
|
| Interactive (clickable/focusable) | buttons, links, inputs, checkboxes |
| Static (read-only) | headings, images, text |
Reading Content
# Inspect an element — cat shows full AX + DOM metadata
dom@shell:~$ cat submit_btn
--- submit_btn ---
Role: button
Type: [x] interactive
AXID: 42
DOM: backend#187
Tag: <button>
ID: submit-form
Class: btn btn-primary
Text: Submit Form
HTML: <button id="submit-form" class="btn btn-primary">Submit Form</button>
# cat on a link reveals the href URL
dom@shell:~$ cat Read_more
--- Read_more ---
Role: link
Type: [x] interactive
AXID: 98
DOM: backend#312
Tag: <a>
URL: https://en.wikipedia.org/wiki/Article_Title
Text: Read more
HTML: <a href="https://en.wikipedia.org/wiki/Article_Title">Read more</a>
# Navigate to parent to find its properties (e.g. span inside a link)
dom@shell:~$ cd ..
dom@shell:~$ cat parent_link
# Bulk extract ALL text from a section (one call instead of 50+ cat calls)
dom@shell:/main$ text
[textContent of /main — 4,821 chars]
Heading: Welcome to Our Site
Today we announce the launch of our new product...
(full article text continues)
# Extract text from a specific child
dom@shell:~$text main
[textContent of main — 4,821 chars]
# Limit output length
dom@shell:~$text main -n 500
# Include link URLs inline as markdown [text](url)
dom@shell:~$text --links main/article/paragraph_2978
--- Text (with links): paragraph_2978 ---
Artificial intelligence (AI) is the capability of [computational systems](https://en.wikipedia.org/wiki/Computer)
to perform tasks typically associated with [human intelligence](https://en.wikipedia.org/wiki/Human_intelligence),
such as [learning](https://en.wikipedia.org/wiki/Learning), [reasoning](https://en.wikipedia.org/wiki/Reason)...
(text + link URLs in a single call)
# Get a tree view (default depth: 2)
dom@shell:~$tree
navigation/
├── [x] home_link
├── [x] about_link
├── [x] products_link
└── [x] contact_link
# Deeper tree
dom@shell:~$tree 4Searching
# Search current directory
dom@shell:~$grep login
[x] login_btn (button)
[d] login_form (form)
[x] login_link (link)
# Recursive search across all descendants
dom@shell:~$grep -r search
[x] search_search (combobox)
[x] search_btn (button)
# Limit results
dom@shell:~$grep -r -n 5 link
# Deep search with full paths (like Unix find)
dom@shell:~$find search
[x] /search_2/search_search (combobox)
[x] /search_2/search_btn (button)
# Find by role type
dom@shell:~$find --type combobox
[x] /search_2/search_search (combobox)
dom@shell:~$find --type textbox
[x] /main/form/email_input (textbox)
[x] /main/form/name_input (textbox)
# Limit results
dom@shell:~$find --type link -n 5
# Find all links with their URLs (great for content extraction)
dom@shell:~$find --type link --meta
[x] /nav/home_link (link) href=https://example.com/ <a>
[x] /main/Read_more (link) href=https://example.com/article <a>Command Chaining (Bash-Style Composition)
DOMShell works like a filesystem — use the same mental model as searching files on disk. grep discovers where content lives (like grep -r in bash), cd scopes your context, and text/cat/find reads content (like cat/head/less). The pipe operator (|) filters output, just like bash.
The pattern is: grep (locate) → cd (scope) → extract (read).
# Workflow 1: Find and read an article section
dom@shell:~$ grep -r article
[d] article (article) → ./main/article/
dom@shell:~$ cd main/article
dom@shell:~/main/article$ text
[full article content in one call]
# Workflow 2: Find a section and extract its links
dom@shell:~$ grep -r references
[d] references (region) → ./main/article/references/
dom@shell:~$ cd main/article/references
dom@shell:~/main/article/references$ find --type link --meta
[x] /wiki_link (link) href=https://en.wikipedia.org/... <a>
[x] /paper_link (link) href=https://arxiv.org/... <a>
# Workflow 3: Find a table and extract structured data
dom@shell:~$ grep -r table
[d] table_4091 (table) → ./main/section/table_4091/
dom@shell:~$ extract_table table_4091
| Name | Value | Date |
|--------|--------|------------|
| Alpha | 42 | 2025-01-15 |
| Beta | 87 | 2025-02-20 |
# Workflow 4: Discover sections, then drill into one
dom@shell:~$ grep -r heading
[−] Introduction_heading (heading) → ./main/article/Introduction_heading
[−] Methods_heading (heading) → ./main/article/Methods_heading
[−] Results_heading (heading) → ./main/article/Results_heading
dom@shell:~$ cd main/article/Results_heading
dom@shell:~/main/article/Results_heading$ text
[text content of the Results section]
# Workflow 5: Find elements by visible text (not just name)
dom@shell:~$ grep -r --content "sign up"
[x] get_started_btn (button) → ./main/hero/get_started_btn
# The button's NAME is "get_started_btn" but its displayed text says "Sign Up Free"
dom@shell:~$ click get_started_btnPipe Operator
The pipe operator (|) lets you filter command output, just like bash:
# Filter find results to only GitHub links
dom@shell:~$ find --type link --meta | grep github
[x] /main/repo_link (link) href=https://github.com/example <a>
# Filter ls output to elements mentioning "login"
dom@shell:~$ ls --text | grep login
[x] login_btn "Log in to your account"
# Limit results with head
dom@shell:~$ find --type heading | head -n 3
[−] /main/intro_heading (heading)
[−] /main/features_heading (heading)
[−] /main/pricing_heading (heading)
# Chain multiple pipes
dom@shell:~$ find --type link --meta | grep docs | head -n 5Path Resolution
All commands accept relative paths, eliminating the need to cd first:
# Read text from a nested element directly
dom@shell:~$ text main/article/paragraph_2971
# Click a button inside a form without cd'ing
dom@shell:~$ click main/form/submit_btn
# Inspect a link in the nav
dom@shell:~$ cat navigation/home_linkSibling Navigation
Use --after and --before flags on ls to find content relative to a landmark:
# Show the 3 elements after a heading
dom@shell:~$ ls --after See_also_heading -n 3 --text
[d] related_topics_list "Machine Learning, Deep Learning, Neural..."
[−] paragraph_4512 "For more information on these topics..."
[x] Read_more_link "Read more on Wikipedia"
# Find links after a specific section heading
dom@shell:~$ ls --after References_heading --type link --meta
[x] source_1_link (link) href=https://arxiv.org/... <a>
[x] source_2_link (link) href=https://doi.org/... <a>The key insight: grep output feeds cd, and cd scopes everything else. When you don't know where content lives on a page, always grep first, then scope, then extract.
Interacting with Elements
# Click a button or link
dom@shell:~$click submit_btn
✓ Clicked: submit_btn (button)
(tree will auto-refresh on next command)
# Focus an input field
dom@shell:~$focus email_input
✓ Focused: email_input
# Type into the focused field
dom@shell:~$type hello@example.com
✓ Typed 17 characters
# Navigate to a URL (current tab)
dom@shell:~$navigate https://example.com
✓ Navigated to https://example.com
# Open a URL in a new tab
dom@shell:~$open https://github.com
✓ Opened new tab → https://github.comAuto-Refresh on DOM Changes
DOMShell automatically detects when the page changes — navigation, DOM mutations, or content updates from clicks. You no longer need to manually run refresh:
dom@shell:~$click search_btn
✓ Clicked: search_btn (button)
(tree will auto-refresh on next command)
dom@shell:~$ls
(page changed — tree refreshed, 312 nodes, path reset to tab root)
main/
navigation/
search_results/
...If the page navigated, CWD is reset to the tab root. If the DOM just updated in place, your CWD is preserved. You can still force a manual refresh:
dom@shell:~$refresh
✓ Refreshed. 312 AX nodes loaded.Tab Completion
Press Tab to auto-complete commands and element names — works like bash:
dom@shell:$ ta<Tab>
# completes to: tabs
dom@shell:$ cd nav<Tab>
# completes to: cd navigation/
dom@shell:$ click sub<Tab>
# if multiple matches, shows options:
# submit_btn
# subscribe_linkSingle match: auto-completes inline
Multiple matches: shows options below, fills the longest common prefix
cdonly completes directories; other commands complete all elements
Paste Support
Cmd+V (Mac) / Ctrl+V (Windows/Linux) pastes text directly into the terminal. Multi-line pastes are flattened to a single line.
System Commands
# Check if you're authenticated (reads cookies)
dom@shell:~$whoami
URL: https://example.com
Status: Authenticated
Via: session_id
Expires: 2025-12-31T00:00:00.000Z
Total cookies: 12
# Environment variables
dom@shell:~$env
SHELL=/bin/domshell
TERM=xterm-256color
# Set a variable
dom@shell:~$export API_KEY=sk-abc123
# Debug the raw AX tree
dom@shell:~$debug stats
--- Debug Stats ---
Total AX nodes: 247
Ignored nodes: 83
Generic nodes: 41
With children: 62
Iframes: 2Getting Help
Every command supports --help:
dom@shell:$ ls --help
ls — List children of the current node
Usage: ls [options]
Options:
-l, --long Long format: type prefix, role, and name
-r, --recursive Show nested children (one level deep)
-n N Limit output to first N entries
--offset N Skip first N entries (for pagination)
--type ROLE Filter by AX role (e.g. --type button)
--count Show count of children only
...Command Reference
Browser Level
Command | Description |
| List all open tabs (shortcut for |
| List all windows with their tabs grouped underneath |
| Jump to the active tab in the focused window |
| Go to browser root |
| Switch to a tab by ID (enters automatically) |
| Switch to a tab by title/URL substring match |
| Browse a window's tabs |
| Navigate the current tab to a URL |
| Open a URL in a new tab and enter it |
| Go back in browser history (like the back button) |
| Go forward in browser history |
| Close the current tab (or a specific tab by ID) |
DOM Tree
Command | Description |
| List children ( |
| Navigate ( |
| Print current path (DOM path or browser path) |
| Tree view of current node (default depth: 2) |
| Full element metadata: AX info + DOM properties (tag, href, src, id, class, outerHTML) |
| Bulk extract all text from a section ( |
| Structured subtree extraction ( |
| Search by name/role/value ( |
| Deep recursive search ( |
| Extract all links as |
| Extract table as markdown or CSV ( |
| Click an element (falls back to coordinate-based click) |
| Focus an input element |
| Type text into the focused element |
| Atomic form fill: focus + clear + type + submit ( |
| Scroll page by N viewport heights (default: 1). Returns scroll position %. |
| Scroll a specific element into the center of the viewport |
| Execute JavaScript in the tab context. Returns JSON-serialized result. Supports async/await. |
| Capture a PNG screenshot of the current tab (returns image via MCP, base64 in shell) |
| Select a dropdown option by value or visible text (dispatches change/input events) |
| Wait for an element matching pattern to appear (polls AX tree, default 5s timeout, max 30s) |
| Evaluate a JS expression (read-only, no |
| Compare AX tree against pre-action snapshot. Shows added/removed/changed elements after click/submit/navigate. |
| Force re-fetch the Accessibility Tree |
Automation
Command | Description |
| Re-run a command periodically. |
| Iterate over output lines. |
| Save and run multi-command scripts. |
| Run a command across all matching tabs. Restores original tab afterward. |
| List callable global JS functions on the page with name, arity, params. |
| Call a global JS function by name. Args auto-parsed (JSON or string). Write-tier. |
System
Command | Description |
| Check session/auth cookies for the current page |
| Show environment variables |
| Set an environment variable |
| Show command history. |
| Save/list named paths. |
| Inspect raw AX tree ( |
| Connect to an MCP server via WebSocket bridge |
| Disconnect from the MCP server, clear token |
| Show all available commands |
| Clear the terminal |
How the Filesystem Mapping Works
DOMShell maps the browser into a two-level virtual filesystem:
Browser Level (~)
The browser itself becomes the top of the filesystem hierarchy:
~ (browser root)
├── windows/ (all Chrome windows)
│ ├── <window-id>/ (tabs in that window)
│ │ ├── <tab-id> (cd into = enter AX tree)
│ │ └── ...
│ └── ...
└── tabs/ (flat listing of ALL tabs)
├── <tab-id> (cd into = enter AX tree)
└── ...cd-ing into a tab transparently attaches CDP and drops you into its DOM tree.
DOM Level (inside a tab)
Each tab's Accessibility Tree (AXTree) is read via the Chrome DevTools Protocol. Each AX node gets mapped to a virtual file or directory:
Directories (container roles): navigation/, main/, form/, search/, list/, region/, dialog/, menu/, table/, Iframe/, etc.
Files (interactive/leaf roles): submit_btn, home_link, email_input, agree_chk, theme_switch, etc.
cd .. from the DOM root exits back to the tab listing. cd ~ returns to browser root from anywhere.
Naming Heuristic
Names are generated from the node's accessible name and role:
AX Node | Generated Name |
|
|
|
|
|
|
|
|
|
|
| (flattened — child promoted up) |
Duplicate names are automatically disambiguated with _2, _3, etc.
Node Flattening
The AX tree contains many "wrapper" nodes — ignored nodes, unnamed generics, and role=none elements that add structural noise without semantic meaning. DOMShell recursively flattens through these, promoting their children up so you see the meaningful elements without navigating through layers of invisible divs.
Iframe Support
DOMShell discovers iframes via Page.getFrameTree and fetches each iframe's AX tree separately. Iframe nodes are merged into the main tree with prefixed IDs to avoid collisions, so elements inside iframes appear naturally in the filesystem.
Color Coding
Color | Meaning |
Blue (bold) | Directories (containers) |
Green (bold) | Buttons |
Magenta (bold) | Links |
Yellow (bold) | Text inputs / search boxes |
Cyan (bold) | Checkboxes / radio / switches |
White | Other elements |
Gray | Images, metadata |
Architecture
┌────────────────────┐
│ Claude Desktop │──┐
└────────────────────┘ │
┌────────────────────┐ │ HTTP POST/GET/DELETE ┌─────────────────────┐
│ Claude CLI │──┼─ localhost:3001/mcp ──┐ │ Side Panel (UI) │
└────────────────────┘ │ (Bearer token auth) │ │ │
┌────────────────────┐ │ │ │ React + Xterm.js │
│ Cursor / Other │──┘ │ │ - Paste support │
└────────────────────┘ ▼ │ - Tab completion │
┌─────────────────────┐ │ - Command history │
│ MCP Server │ └─────────┬───────────┘
│ (mcp-server/) │ │
│ │ chrome.runtime
│ Express HTTP server │ .connect()
│ Per-session MCP │ │
│ Security layer: │ ┌─────────▼───────────┐
│ - Auth token │ │ Background Worker │
│ - Command tiers │ │ (Shell Kernel) │
│ - Domain allowlist │ │ │
│ - Audit log │ │ Browser hierarchy │
└──────────┬───────────┘ │ (~, tabs, windows) │
│ │ Command parser │
WebSocket (localhost:9876) │ Shell state (CWD) │
+ auth token │ VFS mapper │
+ alarm keepalive │ CDP client │
│ │ DOM change detect │
└─────────────────►│ WebSocket bridge │
└─────────┬───────────┘
│
chrome.debugger
(CDP 1.3)
│
┌─────────▼───────────┐
│ Active Tab │
│ Accessibility │
│ Tree + iframes │
│ │
│ DOM events ──────►│
│ (auto-refresh) │
└─────────────────────┘The MCP server runs as a standalone HTTP service that any number of MCP clients can connect to simultaneously. It exposes two ports: an HTTP endpoint for MCP clients (default 3001) and a WebSocket bridge for the Chrome extension (default 9876).
The extension follows a Thin Client / Fat Host model. The side panel is a dumb terminal — it captures keystrokes, handles paste, and renders ANSI-colored text. All logic lives in the background service worker: command parsing, AX tree traversal, filesystem mapping, CDP interaction, browser hierarchy navigation, and DOM change detection.
Source Layout
src/
background/
index.ts # Shell kernel — commands, state, message router, auto-refresh, WS bridge
cdp_client.ts # Promise-wrapped chrome.debugger API + iframe discovery
vfs_mapper.ts # Accessibility Tree → virtual filesystem mapping
sidepanel/
index.html # Side panel entry HTML
index.tsx # React entry point
Terminal.tsx # Xterm.js terminal (paste, tab completion, history)
shared/
types.ts # Message types, AXNode interfaces, role constants
public/
manifest.json # Chrome Manifest V3
options.html # Extension settings page (MCP bridge config)
mcp-server/
index.ts # MCP server — standalone Express HTTP + StreamableHTTP, WebSocket bridge, security
proxy.ts # Stdio↔HTTP bridge for clients that require command/args (e.g. Claude Desktop)
package.json # MCP server dependencies
tsconfig.json # MCP server TypeScript configTech Stack
React + TypeScript — Side panel UI
Xterm.js (
@xterm/xterm) — Terminal emulator with Tokyo Night color schemeVite — Build tooling with multi-entry Chrome Extension support
Chrome DevTools Protocol (CDP 1.3) via
chrome.debugger— AX tree access, element interaction, iframe discovery, DOM mutation eventsChrome Manifest V3 —
sidePanel,debugger,activeTab,cookies,storage,alarmspermissions
Development
# Watch mode (rebuilds on file changes)
npm run dev
# One-time production build
npm run build
# Type checking
npm run typecheckAfter building, reload the extension on chrome://extensions/ and reopen the side panel to pick up changes.
Connecting MCP Clients (Claude Desktop, CLI, Cursor, etc.)
DOMShell includes a hardened MCP server that lets any MCP-compatible client control the browser through DOMShell commands. The server runs as a standalone HTTP service — multiple clients can connect simultaneously.
Install via npm (Recommended)
npm install -g @apireno/domshellOr run directly without installing:
npx @apireno/domshell --allow-write --no-confirm --token my-secret-tokenArchitecture
User starts independently:
npx @apireno/domshell --allow-write --token xyz
→ HTTP on :3001/mcp (MCP clients)
→ WebSocket on :9876 (Chrome extension)
Claude Desktop spawns (stdio proxy): ┐
npx domshell-proxy --port 3001 --token xyz ├─► HTTP :3001/mcp
Claude CLI connects directly: │
url: http://localhost:3001/mcp?token=xyz │
Gemini CLI connects directly: │
url: http://localhost:3001/mcp?token=xyz ┘The MCP server is a standalone HTTP service — you start it independently, and any number of MCP clients connect to it. No single client "owns" the server process. For clients that require stdio (like Claude Desktop), a tiny proxy bridges stdio to the running HTTP server.
Setup
1. Start the MCP server:
npx @apireno/domshell --allow-write --no-confirm --token my-secret-tokenThe server starts two listeners:
HTTP on
http://127.0.0.1:3001/mcp— MCP client endpointWebSocket on
ws://127.0.0.1:9876— Chrome extension bridge
Tip: Use
--tokento set a known token so you can pre-configure clients. If omitted, a random token is generated and printed on startup.
2. Connect MCP clients:
Claude CLI / Gemini CLI / Cursor (direct HTTP — recommended):
http://localhost:3001/mcp?token=my-secret-tokenClaude Desktop (requires stdio — use the proxy):
Add to ~/Library/Application Support/Claude/claude_desktop_config.json (macOS):
{
"mcpServers": {
"domshell": {
"command": "npx",
"args": ["-y", "@apireno/domshell", "--allow-write", "--no-confirm", "--token", "my-secret-token"]
}
}
}Restart Claude Desktop. DOMShell tools will appear.
3. Connect the extension (Options Page):
Go to
chrome://extensions/Find DOMShell and click Options (or right-click the extension icon → Options)
Enable the MCP Bridge toggle
Paste the same token you used in the Claude Desktop config (
my-secret-token)Click Save — the status indicator turns green when connected
The options page shows live connection status: Disabled, Connecting, Connected, or Disconnected.
Alternative: Connect via terminal
You can also connect from the DOMShell terminal instead of the options page:
dom@shell:$ connect my-secret-token4. Test it:
Ask Claude: "List my open tabs and tell me what's on the first one."
Security
The MCP server is hardened with multiple layers of security. By default, it's read-only — Claude can browse but not click or type.
Command Tiers
Tier | Commands | Default | Enable With |
Read |
| Enabled | (always on) |
Navigate |
| Disabled |
|
Write |
| Disabled |
|
Sensitive |
| Disabled |
|
The Navigate tier is separate from Write because navigation is equivalent to typing a URL — it requires --allow-write but skips the interactive confirmation prompt. This is important for Claude Desktop where /dev/tty is unavailable.
Security Flags
Flag | Description |
| Enable click/focus/type/scroll/js/select/close/navigate/back/forward commands |
| Enable whoami (cookie access) |
| Shorthand for both |
| Skip user confirmation for write actions (use with caution) |
| Restrict commands to specific domains |
| Show full cookie values (default: redacted) |
| MCP HTTP endpoint port (default: 3001) |
| WebSocket bridge port (default: 9876) |
| Audit log file (default: audit.log) |
User Confirmation
When write commands are enabled, the MCP server prompts in its terminal before executing:
[DOMShell] Claude wants to: click submit_btn
Allow? (y/n):This blocks until you type y or n (60-second timeout → deny). Disable with --no-confirm for trusted environments. When using Claude Desktop, always use --no-confirm since the MCP server runs without a terminal — without it, write commands will silently fail.
Auth Token
Use
--tokento set a known token in the MCP server config, or let the server generate a random one on startupThe extension must present this token (via the options page or
connect <token>) before the bridge worksWebSocket connections without a valid token are rejected
Token is stored in
chrome.storage.local— survives service worker restarts
Domain Allowlist
With --domains, commands are only executed when the active tab's URL matches:
npx tsx index.ts --allow-write --domains "github.com,docs.google.com"Audit Log
Every command is logged with timestamps to audit.log (or --log-file):
[2026-02-07T12:00:00.000Z] EXECUTE: ls -l
[2026-02-07T12:00:01.000Z] RESULT: 12 items
[2026-02-07T12:00:05.000Z] [WRITE] EXECUTE: click submit_btn
[2026-02-07T12:00:05.500Z] [WRITE] RESULT: ✓ Clicked: submit_btn (button)Disconnecting
Disable the MCP Bridge toggle in the extension options page, or run disconnect in the DOMShell terminal:
dom@shell:$ disconnect
✓ Disconnected from MCP server.MCP Tools Reference
MCP Tool | Maps To | Tier |
|
| Read |
|
| Read |
|
| Read |
|
| Read |
|
| Read |
|
| Read |
|
| Read |
|
| Read |
|
| Read |
|
| Read |
|
| Read |
|
| Read |
|
| Read |
|
| Read |
|
| Navigate |
|
| Navigate |
|
| Write |
|
| Write |
|
| Write |
|
| Write |
|
| Write |
|
| Write |
|
| Navigate |
|
| Navigate |
|
| Write |
|
| Read |
|
| Write |
|
| Read |
|
| Read |
|
| Read |
|
| Sensitive |
|
| Read |
|
| Write |
|
| Read |
|
| Read |
|
| Read |
|
| Read |
| (any command) | Varies |
Roadmap
Distribution & Setup
Chrome Web Store listing — publish to the store for one-click install
GitHub release with .crx — downloadable extension package for sideloading
MCP setup wizard — an
npx @apireno/domshell initcommand (or in-extension prompt) that generates the Claude Desktop JSON config, sets a shared token, and writes it toclaude_desktop_config.jsonautomaticallySupport for other MCP clients — Gemini Desktop, OpenAI ChatGPT desktop, Cursor, Windsurf, and other MCP-compatible hosts
New Commands
watch— periodic re-execution of a command (e.g.watch ls --times 3 --interval 1to poll for DOM changes)history— command history with recall (history,!nto re-run)back— browser-style history navigation within the current tabclose— close the current tab (closeorclose <tab-id>)screenshot— capture a screenshot of the current tab (useful for visual verification alongside AX tree inspection)pipe— pipe output between commands (e.g.find --type link | grep login)select <name>— select an option from a<select>dropdown by value or visible textscroll— scroll the page or a specific element (scroll down,scroll up,scroll <name>)wait— wait for a specific element to appear (e.g.wait submit_btnblocks until it exists in the tree)for— iterate over command output lines (e.g.for "find --type heading -n 3" : text {}) — replaces manual iterationscript— save and run multi-command scripts (e.g.script save scrape open url ; cd main ; text) for repeatable workflows
JavaScript Layer
js— execute arbitrary JavaScript in the tab context and return the resultfunctions—functions [pattern]lists callable global JS functions with name/arity/params;call funcName arg1 arg2invokes them.callis write-tier.eval <expr>— quick expression evaluation (e.g.eval document.title,eval window.location.href)
Agent Ergonomics
--text— show visible text previews inline withlsandfindusing.innerText(rendered text only, respects CSS visibility); configurable length via--textlen N;catalso shows VisibleText separately from textContent--meta— show DOM properties (href, src, id, tag) inline withls,find, andreadoutput — essential for extracting URLs without separatecatcalls--content— search by visible text content withgrep --contentandfind --content(orfind --text "pattern") — finds elements by what they display, not just their AX namePath resolution — all commands accept relative paths (e.g.
text main/article/paragraph,click form/submit_btn) — eliminates unnecessarycdround-tripsSibling navigation —
--after/--beforeflags onlsto slice children relative to a landmark element (e.g.ls --after heading --type link --meta)--links— include hyperlink URLs inline as markdown[text](url)in text output; extracts both content and link destinations in a single call (e.g.text --links main/paragraph)Fuzzy type aliases for —
find --typeaccepts natural-language aliases (input, dropdown, nav, toggle, modal, image, btn, sidebar, etc.) that expand to matching AX roles — eliminates wasted tool calls from guessing exact role namesVisible text cache — lazy cache for
innerTextresults, keyed bybackendDOMNodeId, cleared on tree rebuild — eliminates redundant CDP calls during--contentmatching ingrep/findbookmark— save named paths for quick navigation (e.g.bookmark inbox ~/tabs/gmail/main/inbox_list,cd @inbox)each— run a command across multiple tabs (e.g.each --pattern wiki textto extract text from every Wikipedia tab)Structured output mode —
--jsonflag on commands for machine-parseable output (e.g.ls --json,cat --json,find --json,diff --json)Session persistence — save and restore shell state (path, env vars, bookmarks, history) across service worker restarts via
chrome.storage.localdiff— compare AX tree snapshots to see what changed after an action (auto-snapshots before click/submit/navigate)
Platform
Standalone headless browser — ship DOMShell as a self-contained headless Chromium process (via Chrome for Testing or embedded Chromium) that agents launch directly — no extension install, no user Chrome profile; just
npx @apireno/domshell --headlessand connect via MCP. Ideal for CI pipelines, server-side automation, and agent-in-a-loop workflows where a visible browser isn't neededFirefox extension — port to Firefox using WebExtensions API + remote debugging protocol
Playwright/Puppeteer backend — alternative to Chrome extension for headless agent workflows
REST API mode — expose DOMShell commands over HTTP for non-MCP integrations
WASM build — compile DOMShell to WebAssembly so it can be embedded directly on a website for interactive demos without requiring a Chrome extension install
Experiments
Nexa: DOMShell vs Raw HTML — same model (Qwen3-4B), same tasks: compare DOMShell's text/AX-tree interface against raw HTML scraping. Tests on both nexa serve and Ollama backends. Found a crossover interaction: Ollama+DOMShell and Nexa+HTML are equally best (1.20 avg). See
experiments/nexa_ollama/.Nexa vs Claude (model size) — compared Qwen3-1.7B/4B on progressive tasks. Capability cliff at T3 (paragraph extraction). 4B shows better error recovery. See
experiments/nexa_claude/.Model shootout — compared Qwen3-4B, Hermes3-3B, Granite4-Tiny, Llama3.2-3B on Ollama+DOMShell. Qwen3-4B remains best (8/15), only model to break the T3 cliff. Llama3.2-3B close second (7/15, zero hallucinations). See
experiments/model_shootout/.
Integrations
Nexa AI (Local LLM)
Run DOMShell with local models via nexa-sdk — fully on-device browser automation with no cloud API needed. Uses the same MCP protocol as Claude Desktop but powered by local inference (Granite-4-Micro, Qwen3, etc.).
python integrations/nexa/agent.py --task "Open wikipedia.org/wiki/AI and extract the first paragraph" --verboseSee integrations/nexa/ for setup and usage.
How This Project Was Built
The technical specification for DOMShell was authored by Google Gemini, designed as a comprehensive prompt that could be handed directly to a coding agent to scaffold and build the entire project from scratch. The full original specification is preserved in intitial_project_prompt.md.
The implementation was then built by Claude (Anthropic) via Claude Code, working from that specification.
An AI-designed project, built by another AI, intended for AI agents to use. It's agents all the way down.
Links
Built by Pireno
License
MIT