Skip to main content
Glama
Mhrnqaruni

MCP Playwright Browser

by Mhrnqaruni

MCP Playwright Browser Server

A production-grade Model Context Protocol (MCP) server that gives AI assistants full browser control through Playwright — using a hybrid DOM + Accessibility Tree + Visual approach. Built for real-world agentic automation: job applications, web scraping, form filling, and complex multi-tab workflows.

v2.0 is a complete rewrite. The server grew from 680 lines and 23 tools to nearly 5,000 lines and 71 tools, with a modular architecture, token-optimized capture profiles, hard payload budgets, and a full test suite.


Table of Contents


What's New in v2.0

The Problem v1 Had

v1 was a working proof of concept. It could browse pages and extract jobs. But when used with Gemini CLI for real tasks — filling application forms, navigating multi-tab flows, handling downloads — it hit hard limits:

  • Token waste: Every tool response dumped everything it found. One browser.snapshot on a complex page could push 50KB+ into Gemini's context window in a single call, rapidly exhausting the budget.

  • No multi-tab support: If a link opened a new tab (very common in job applications), Gemini was stuck with no way to switch to it.

  • No form intelligence: Filling a form required manual click-by-click instructions. There was no way to ask "what fields are still empty?" or "fill all required fields."

  • Brittle DOM-only navigation: Shadow DOM, iframes, and obfuscated element IDs caused failures with no fallback.

  • No session persistence: Every run started fresh. Logging in again and again wasted time and triggered bot detection.

  • No safety rails: The AI could write files anywhere on disk, run arbitrary JS, or create its own automation scripts — unguarded.

  • Monolithic: One 680-line file with no tests.

What v2.0 Solves

Every one of those problems has a specific solution in v2.0:

Problem

v2.0 Solution

Token waste

Capture Profile System (light/balanced/full) + 280KB hard payload ceiling

Multi-tab stuck

Page Manager with stable pageIds, browser.list_pages, browser.select_page

Dumb form filling

browser.form_audit + browser.fill_form + Google Forms specialist tools

Shadow DOM / obfuscated IDs

A11y tree via CDP Accessibility.getFullAXTree with stable ax- UIDs

Session loss

Cookie export/import, browser.export_storage_state / browser.import_storage_state

No safety

Path allowlist in src/security/paths.js, MCP_ALLOW_EVALUATE guard

Monolithic

10 focused modules in src/browser/ + src/security/ + 18-test suite


v1 vs v2 Comparison

Dimension

v1.0

v2.0

Total MCP tools

23

71

Server size

680 lines, 1 file

4,966 lines, 11 modules

Token efficiency

Uncontrolled dumps

Capture profiles + 280KB hard ceiling

Multi-tab support

Single tab only

Full page manager (list, select, close)

Form automation

Manual click-by-click

form_audit + fill_form + Google Forms specialist

A11y / Shadow DOM

DOM-only, brittle

CDP Accessibility tree with stable UIDs

Scroll handling

Saw first viewport only

Scroll awareness + container scrolling

Session persistence

None

Cookie/storage export-import

Popup & dialog handling

None

Dialog accept/dismiss, popup pageId capture

Download management

None

Wait-for-download, save to path

File reading (CV/PDF)

None

files.read_text, files.read_pdf_text

Security

No restrictions

Allowlist-enforced read/write paths

Observability

None

Console log capture, network request log

Test coverage

2 tests

18 tests

Profiles

3

5 (+ persistent variants)

Batch scripts

5 .bat launchers

7 .bat launchers

Error handling

Raw exceptions to AI

Normalized, structured, budgeted

What stayed the same

  • Indeed job extractor (production-grade, multi-selector, deduplication)

  • Google search extractor (consent handling, URL deobfuscation)

  • Stealth mode (webdriver hiding, user agent spoofing)

  • CDP connection to real Chrome

  • Visual snapshot + coordinate-based clicking


How It Works

You / Gemini CLI │ │ natural language prompt ▼ Gemini CLI ──── loads MCP config ────► playwrightBrowser MCP server │ ┌────────────────┤ │ │ 71 MCP Tools Payload Budget (browser.*) (280KB ceiling) (forms.*) (capture profiles) (files.*) (retryWith hints) (jobs.*) (search.*) │ ┌─────────┤──────────┐ │ │ │ Playwright CDP API Security (browser) (A11y, (path network, allowlist) clicks) │ Chrome / Chromium

The Capture Ladder

Every profile instructs Gemini to try tools in order, cheapest first:

1. browser.snapshot → plain text summary (cheapest, ~6KB in light mode) 2. browser.list → interactive elements (structured, ~8KB) 3. browser.query_dom → targeted selector query (focused, ~10KB) 4. browser.take_snapshot→ A11y tree with UIDs (rich, only when uid-clicking needed) 5. browser.visual_snapshot → screenshot + bbox map (most expensive, last resort)

Gemini only escalates to a more expensive tool when the cheaper one doesn't have what it needs. This is the core of why v2.0 uses far fewer tokens than v1.0.

The Payload Budget

Every single tool response passes through enforcePayloadCeiling() before being sent to Gemini:

  1. Measure response size in bytes

  2. If under 280KB → send as-is

  3. If over → progressively truncate: arrays shrink, strings truncate, fields drop

  4. Always include retryWith hints telling Gemini exactly what parameters to reduce next time

  5. Absolute floor: {truncated: true} — Gemini never gets a context-crashing response


Quick Start

# Clone git clone https://github.com/Mhrnqaruni/mcp-playwright-browser.git cd mcp-playwright-browser # Install npm install npx playwright install chromium # Run (interactive mode - chat with Gemini) scripts\run-dom-headless.bat # Run (one-shot automation) scripts\run-dom-headless.bat -p "Go to https://example.com and extract the page title" # Run with real Chrome (for logged-in sessions) scripts\run-chrome-profile.bat --kill-chrome

Installation

Prerequisites

  • Node.js 18+

  • npm

  • Gemini CLI: npm install -g @google/gemini-cli then gemini auth login

  • Google Chrome (for CDP and chrome-profile modes)

Setup

1. Install dependencies

npm install npx playwright install chromium

2. Configure the MCP server path

Edit .gemini/settings.json and set cwd to your repo location:

{ "mcpServers": { "playwrightBrowser": { "command": "node", "args": ["src/mcp-browser-server.js"], "cwd": "C:/path/to/mcp-playwright-browser" } } }

3. (Optional) Disable Chrome background apps

Prevents profile locking:

Chrome Settings → Advanced → System → ☐ Continue running background apps when Google Chrome is closed

4. Verify

scripts\run-dom-headless.bat -p "Use MCP server playwrightBrowser. Launch browser. Go to https://example.com. Take a snapshot. Close."

Profile Launchers

Each .bat file pre-configures everything (browser type, stealth, profile, environment variables) and starts Gemini with the right system instructions. You never need to configure Gemini manually.

Available Profiles

Script

Browser

Mode

Best For

run-dom-headless.bat

Chromium

Headless

⚡ Bulk scraping, fastest

run-visual-headful.bat

Chromium

Visible + Screenshots

Debugging, visual verification

run-chrome-profile.bat

Real Chrome

Your profile

Logged-in sessions, form filling

run-cdp-profile.bat

Real Chrome

CDP

Maximum stealth

run-cdp-profile-screen.bat

Real Chrome

CDP + Visual

CDP with screenshot analysis

run-cdp-profile-persist.bat

Real Chrome

CDP + Persistent

Long sessions, multi-step flows

run-cdp-profile-screen-persist.bat

Real Chrome

CDP + Visual + Persistent

Full power mode

Interactive Mode (Chat)

# Start Gemini and chat with it scripts\run-chrome-profile.bat --kill-chrome # Then just type: # "Fill out the job application at [URL] using my CV" # "Go to LinkedIn and apply to the first 5 jobs" # "Extract all AI engineer jobs from Indeed and save them"

One-Shot Mode (Automation)

# Run a task and get a log file scripts\run-dom-headless.bat -p "Your full task here" # With custom output scripts\run-dom-headless.bat -p "Extract 50 jobs from Indeed" --output logs\jobs.log # Chrome profile one-shot scripts\run-chrome-profile.bat --kill-chrome -p "Submit application at [URL]" --output logs\apply.log

Logs are auto-saved to logs/ with timestamps.

Profile Details

run-dom-headless.bat — Fastest

  • Chromium headless (no GUI)

  • Best for: bulk extraction, scraping, background tasks

  • Token usage: lowest (no screenshots)

run-visual-headful.bat — Debugging

  • Chromium with visible window

  • Screenshot-based navigation available

  • Best for: troubleshooting, visual verification

run-chrome-profile.bat — Authenticated Sessions

  • Real Chrome with your existing logged-in profile

  • Already signed into Gmail, LinkedIn, job sites

  • Use --kill-chrome to free profile before starting

  • Best for: job applications, authenticated scraping

run-cdp-profile.bat — Maximum Stealth

  • Connects to real Chrome via Chrome DevTools Protocol

  • Hardest for sites to detect as automation

  • Best for: sites that block Playwright/Chromium

  • Auto-closes any existing Chrome using the profile before launch

run-cdp-profile-persist.bat — Long Sessions

  • CDP mode with persistent browser (doesn't close between tasks)

  • Best for: multi-step workflows where browser state must survive


All 71 MCP Tools

Capture Profile Control

Tool

Description

browser.set_capture_profile

Set light / balanced / full profile. Controls token usage across all tools. Call this first.

browser.get_capture_profile

Show current profile settings and payload budget.

Browser Lifecycle

Tool

Description

browser.launch

Launch Chromium with options: headless, stealth, userDataDir, profileDirectory, channel, slowMo, args

browser.launch_chrome_cdp

Launch real Chrome with remote debugging + connect in one step

browser.connect_cdp

Connect to existing Chrome with --remote-debugging-port

browser.close

Close browser session

browser.reload

Reload current page

Multi-Tab Management

Tool

Description

browser.new_page

Open new tab, tracked by page manager

browser.list_pages

List all open tabs with pageId, url, title, active/closed state

browser.select_page

Switch active tab by pageId

browser.close_page

Close a specific tab by pageId

browser.list_frames

List all iframes on the current page

Navigation

Tool

Description

browser.goto

Navigate to URL with configurable waitUntil and timeout

browser.back

Go back in history

browser.forward

Go forward in history

browser.wait

Wait for selector or fixed ms

browser.wait_for

Smart wait: selector, text, or uid (A11y)

Event & Dialog Handling

Tool

Description

browser.list_dialogs

List pending JS dialogs (alert, confirm, prompt)

browser.handle_dialog

Accept or dismiss a dialog, optionally with input text

browser.wait_for_download

Block until a download starts, returns downloadId

browser.save_download

Save a captured download to a specific path

browser.wait_for_popup

Wait for a new tab/popup to open, returns its pageId

browser.expect_event

Listen for a one-time event: dialog, download, navigation, request, response

Tool

Description

browser.get_cookies

List cookies, optionally filtered by URL

browser.set_cookies

Inject cookies into browser session

browser.clear_cookies

Clear all or URL-specific cookies

browser.export_storage_state

Export full session state (cookies + localStorage) to JSON file

browser.import_storage_state

Restore session from previously exported JSON

Scroll Control

Tool

Description

browser.get_scroll_state

Returns scrollY, scrollHeight, atTop, atBottom, viewport info

browser.scroll_by

Scroll page by delta pixels (vertical + horizontal)

browser.scroll_to

Scroll to absolute position

browser.get_scrollables

Detect all scrollable containers on the page

browser.get_container_scroll_state

Scroll metrics for a specific container selector

browser.scroll_container

Scroll a specific container by selector

Page Reading & Snapshots

Tool

Description

browser.snapshot

Plain text page summary: title, text, links, optional headings + forms summary

browser.take_snapshot

A11y tree via CDP: roles, names, UIDs (ax-{nodeId}), depth, state

browser.query_dom

Flexible selector query: text, value, bbox, visibility, state, tagName

browser.evaluate

Execute JavaScript (requires MCP_ALLOW_EVALUATE=true, origin-gated)

Element Interaction

Tool

Description

browser.list

List visible interactive elements with elementId, tag, text, href

browser.click

Click by elementId, uid, selector, or text

browser.hover

Hover over element (triggers dropdown menus, tooltips)

browser.type

Simulate keypress-by-keypress typing

browser.fill

Direct value fill (faster, no keypress simulation)

browser.press

Press keyboard key (Enter, Tab, Escape, etc.)

browser.set_input_files

Upload file to input[type=file]

browser.scroll_to_uid

Scroll a UID element into view

Visual Navigation

Tool

Description

browser.screenshot

Save screenshot to path

browser.visual_snapshot

Screenshot + element map with bounding boxes and IDs

browser.click_at

Click at viewport-relative X/Y coordinates

browser.click_at_page

Click at document-absolute X/Y coordinates

Data Extraction

Tool

Description

browser.extract_text

Extract text from CSS selector (single or all matches)

browser.extract_html

Extract outerHTML from selector

Form Automation

Tool

Description

browser.form_audit

Scan page for all unfilled required fields: text, select, radio, checkbox, contenteditable

browser.fill_form

Fill a list of {label, selector, value, kind} fields — label-driven or selector-driven

forms.google_audit

Google Forms specialist: list all questions and check aria-checked for answers

forms.google_set_text

Fill a Google Forms text question by question text

forms.google_set_dropdown

Select option in Google Forms dropdown

forms.google_set_checkbox

Check/uncheck Google Forms checkbox

forms.google_set_radio

Select option in Google Forms radio group

forms.google_set_grid

Select option in Google Forms grid question

Observability

Tool

Description

browser.list_console_messages

Show captured console.log/warn/error from the page

browser.list_network_requests

Show all network requests (URL, method, status, timing)

browser.get_network_request

Get full details for a specific request by ID

File Operations

Tool

Description

files.read_text

Read text file (restricted to allowed paths)

files.read_pdf_text

Extract text from PDF — used to read CV files

files.list_dir

List directory contents

files.write_text

Write text to file (restricted to output/ and logs/)

Specialized Extractors (Production Examples)

Tool

Description

jobs.extract_indeed

Extract Indeed job listings with multi-selector fallbacks, deduplication, access detection

jobs.indeed_next_page

Navigate to next Indeed page (direct URL, click, or auto mode)

search.google

Open Google search and extract results with consent handling

search.extract_google

Extract results from current Google search page


Architecture

Module Structure

src/ ├── mcp-browser-server.js # Main server: tool registration, env config, middleware ├── extractors.js # Indeed + Google specialized extractors ├── browser/ │ ├── pages.js # Multi-tab page manager (stable pageIds) │ ├── snapshot.js # A11y tree via CDP Accessibility.getFullAXTree │ ├── capture-profiles.js # light/balanced/full × low/high = 30 preset configs │ ├── payload-budget.js # Hard 280KB response ceiling with graceful truncation │ ├── cdp.js # CDP session, click/hover/scroll by backendNodeId │ ├── dom-version.js # DOM mutation tracking, frame management │ ├── forms.js # Form audit + intelligent form fill │ ├── observability.js # Console + network request capture via CDP │ └── wait.js # Smart wait: selector, text, uid └── security/ └── paths.js # Read/write path allowlist enforcement

Tool Registration Middleware

Every tool goes through a wrapper that runs before and after the handler:

AI calls tool │ ▼ assign requestId │ ▼ run handler │ ▼ normalize errors (structured, no stack traces) │ ▼ add envelope (ok, requestId, timestamp, url, domVersion) │ ▼ enforcePayloadCeiling (truncate if > 280KB) │ ▼ send to AI

This means every tool automatically benefits from error safety and payload budgeting without any extra code per tool.

UID System

The A11y snapshot (browser.take_snapshot) assigns every node a stable UID in the format ax-{nodeId}, tied to the CDP backendDOMNodeId. This UID can then be used with:

  • browser.click({ uid: "ax-123" }) — clicks via CDP directly on the backend node

  • browser.scroll_to_uid({ uid: "ax-123" }) — scrolls it into view first

  • browser.wait_for({ uid: "ax-123" }) — waits until it's visible

CDP-native clicks are more reliable than selector-based clicks because they bypass CSS selector resolution and work even in Shadow DOM.


Token Efficiency: Capture Profiles

This is the most important v2.0 feature for real-world use.

The Problem

AI context windows are finite. Every tool response consumes tokens. A naive implementation that dumps everything on every call quickly exhausts the budget.

The Solution: Three Profiles

Set the profile once at session start, and every subsequent tool call automatically uses appropriate limits:

browser.set_capture_profile({ profile: "light" })

Profile

Snapshot chars

List items

A11y nodes

Best For

light

6,000–9,000

120–180

220–320

Job scraping, bulk tasks

balanced

12,000–16,000

240–320

440–700

Form filling, research

full

20,000

500

1,200–2,000

Deep debugging only

Two Detail Levels Per Profile

Within each profile, tools accept detail: "low" or detail: "high":

browser.snapshot({ detail: "low" }) # minimal, fast browser.snapshot({ detail: "high" }) # more text, links, headings, form summary

The Capture Ladder in Practice

The profile system instructions teach Gemini to escalate only when needed:

✅ "I need to find the Apply button" → browser.snapshot (low) # did I find it in plain text? usually yes → browser.list (low) # still looking? check interactive elements → browser.take_snapshot (low) # need uid for reliable click? A11y tree → browser.visual_snapshot (low) # shadow DOM / can't find it at all? visual fallback

In light mode, this entire ladder costs roughly 8x fewer tokens than v1.0's single dump approach.

Hard Payload Budget

Even with capture profiles, some pages are just huge. The payload budget is a safety net:

  • Default ceiling: 280KB per response

  • If exceeded: truncate progressively (arrays → strings → object keys)

  • Include retryWith field: { detail: "low", maxItems: 80, limit: 20 }

  • Gemini reads this and retries with smaller parameters

  • Absolute fallback: { truncated: true, truncationReason: "..." }

The budget is configurable: MCP_MAX_RESPONSE_BYTES=150000 for tighter contexts.


Common Use Cases

Job Application (Chrome Profile)

# Start with your real logged-in Chrome scripts\run-chrome-profile.bat --kill-chrome

In Gemini:

Set capture profile to light. Go to [application URL]. Run form_audit to see all required fields. Fill them using fill_form with my details from Applied Jobs/CODEX/maincv.md. Before submitting, take a screenshot and ask me to confirm.

Bulk Job Scraping (Headless)

scripts\run-dom-headless.bat -p "Use playwrightBrowser. Launch browser headless. Go to https://ae.indeed.com/q-ai-engineer-l-dubai-jobs.html. Extract jobs with jobs.extract_indeed limit 20, save to output/indeed/page-1. Go to next page with jobs.indeed_next_page. Extract again, save to output/indeed/page-2. Close."

Session Persistence (Login Once, Reuse)

# First time: login manually and export session scripts\run-cdp-profile.bat

In Gemini:

Go to linkedin.com and wait for me to log in. After I confirm logged in, run browser.export_storage_state to output/linkedin-session.json.

Next time:

Run browser.import_storage_state from output/linkedin-session.json. Go to linkedin.com — should be logged in already.

Google Form Automation

scripts\run-dom-headless.bat

In Gemini:

Go to [Google Form URL]. Run forms.google_audit to see all questions. Fill each question using the appropriate forms.google_set_* tool. Run forms.google_audit again to verify all answered. Submit.

PDF CV Reading

Gemini can read your CV directly without you pasting it:

Read my CV from Applied Jobs/CODEX/maincv.md using files.read_text. Or read the PDF version: files.read_pdf_text from Applied Jobs/CODEX/CV.pdf. Use that information to fill the job application form.

Debugging with Visual Mode

scripts\run-visual-headful.bat

In Gemini:

Go to [URL]. Take a visual_snapshot and save to output/debug.png. Tell me what you see and identify any unusual elements.

Environment Variables

All variables have dual names for Gemini CLI compatibility. The launchers set both:

Variable

Alias

Description

MCP_HEADLESS

GEMINI_CLI_MCP_HEADLESS

true/false — run without GUI

MCP_STEALTH

GEMINI_CLI_MCP_STEALTH

true/false — enable anti-detection

MCP_CHANNEL

GEMINI_CLI_MCP_CHANNEL

chrome — use real Chrome

MCP_EXECUTABLE_PATH

GEMINI_CLI_MCP_EXECUTABLE_PATH

Absolute path to chrome.exe

MCP_USER_DATA_DIR

GEMINI_CLI_MCP_USER_DATA_DIR

Chrome profile directory

MCP_PROFILE

GEMINI_CLI_MCP_PROFILE

Profile name: Default, Profile 3

MCP_CDP_ENDPOINT

GEMINI_CLI_MCP_CDP_ENDPOINT

CDP URL: http://127.0.0.1:9222

MCP_CDP_PORT

GEMINI_CLI_MCP_CDP_PORT

CDP port number (default 9222)

MCP_CDP_AUTO_CLOSE

GEMINI_CLI_MCP_CDP_AUTO_CLOSE

Close Chrome on server exit

MCP_FORCE_CDP

GEMINI_CLI_MCP_FORCE_CDP

Disable browser.launch (CDP-only mode)

MCP_REQUIRE_PROFILE

GEMINI_CLI_MCP_REQUIRE_PROFILE

Require userDataDir (prevent bare Chromium)

MCP_ALLOW_EVALUATE

GEMINI_CLI_MCP_ALLOW_EVALUATE

Enable browser.evaluate tool

MCP_EVALUATE_ALLOW_ORIGINS

GEMINI_CLI_MCP_EVALUATE_ALLOW_ORIGINS

Comma-separated allowed origins for evaluate

MCP_CAPTURE_PROFILE

GEMINI_CLI_MCP_CAPTURE_PROFILE

Default profile: light, balanced, full

MCP_MAX_RESPONSE_BYTES

GEMINI_CLI_MCP_MAX_RESPONSE_BYTES

Override 280KB payload ceiling

MCP_SLOWMO_MS

GEMINI_CLI_MCP_SLOWMO_MS

Slow down actions by N ms (debugging)

Why dual names? Gemini CLI sanitizes environment variables and may strip MCP_* prefixed keys. The GEMINI_CLI_MCP_* variants bypass this filtering. The server reads both and uses whichever is set.


Project Structure

mcp-playwright-browser/ │ ├── src/ │ ├── mcp-browser-server.js # Main server (71 tools, middleware, env config) │ ├── extractors.js # Indeed + Google production extractors │ ├── browser/ │ │ ├── pages.js # Multi-tab page manager │ │ ├── snapshot.js # A11y tree (CDP Accessibility API) │ │ ├── capture-profiles.js # Token budget profiles (light/balanced/full) │ │ ├── payload-budget.js # Hard response size ceiling │ │ ├── cdp.js # CDP primitives (click, hover, scroll by nodeId) │ │ ├── dom-version.js # DOM mutation tracking + frame management │ │ ├── forms.js # Form audit + intelligent fill │ │ ├── observability.js # Console + network capture │ │ └── wait.js # Smart wait (selector, text, uid) │ ├── security/ │ │ └── paths.js # File read/write path allowlist │ └── tests/ │ ├── page-manager-test.js │ ├── security-paths-test.js │ ├── snapshot-uid-test.js │ ├── uid-click-fill-test.js │ ├── elementid-no-stale-test.js │ ├── wait-for-test.js │ ├── form-audit-fill-test.js │ ├── console-network-test.js │ ├── visual-coords-test.js │ ├── frame-domversion-test.js │ ├── cdp-hover-test.js │ ├── browser-events-test.js │ ├── storage-state-test.js │ ├── capture-profiles-test.js │ ├── payload-budget-test.js │ ├── google-form-test.js │ ├── google-test.js │ └── indeed-test.js │ ├── scripts/ │ ├── run-dom-headless.bat # Fastest: headless Chromium │ ├── run-visual-headful.bat # Visual: Chromium + screenshots │ ├── run-chrome-profile.bat # Auth: real Chrome with your profile │ ├── run-cdp-profile.bat # Stealth: CDP mode │ ├── run-cdp-profile-screen.bat # Stealth + visual │ ├── run-cdp-profile-persist.bat # Stealth + persistent session │ ├── run-cdp-profile-screen-persist.bat # Full power │ ├── autoconnect.js # CDP auto-connect helper │ └── .gemini/settings.json # Fallback MCP config │ ├── profiles/ │ ├── dom/ │ │ ├── system.md # Gemini system instructions (DOM mode) │ │ └── oneshot.md # One-shot variant (closes browser at end) │ ├── visual/ │ │ ├── system.md │ │ └── oneshot.md │ ├── cdp/ │ │ ├── system.md │ │ ├── oneshot.md │ │ └── persistent.md │ └── cdp-visual/ │ ├── system.md │ ├── oneshot.md │ └── persistent.md │ ├── .gemini/settings.json # Main MCP config (set your cwd here) ├── GEMINI.md # Project-level Gemini instructions ├── LICENSE # ISC License └── README.md

Running Tests

# All tests that don't need network npm run test:local # Live network tests (Indeed + Google) npm run test:remote # Everything npm run test:all

Troubleshooting

"Chrome is already running" / Profile locked

# Use --kill-chrome scripts\run-chrome-profile.bat --kill-chrome # Or manually taskkill /F /IM chrome.exe

Chrome 136+ blocks automation on the default User Data directory. Always use a dedicated profile or the ChromeForMCP data dir.

"Gmail says browser is not safe"

You're connected via Chromium, not your real Chrome. Ensure:

  1. Chrome is fully closed before starting (--kill-chrome)

  2. The launch response shows "persistent": true and your profile path

  3. If not, restart Gemini and verify .bat outputs Using Chrome executable: ...

MCP tools not found in Gemini

  • Run any .bat from any directory — they auto-fix cwd

  • Verify .gemini/settings.json has the correct cwd

  • The scripts/.gemini/settings.json is a fallback if Gemini starts in scripts/

Responses truncated / retryWith hint

This is the payload budget working correctly. Gemini will read the retryWith hint and retry with lower parameters. If it keeps happening, switch to light profile:

browser.set_capture_profile({ profile: "light" })

Slow performance

  • Use run-dom-headless.bat for bulk operations (no GUI = 3-4x faster)

  • Avoid browser.extract_html — it returns full HTML and wastes tokens

  • Use detail: "low" on all tools unless you specifically need more

Browser opens but ignores my profile

Check .bat output for:

Using Chrome executable: C:\Program Files\Google\Chrome\Application\chrome.exe Using Chrome profile: Profile 3

If you see a different profile or "not found", edit the .bat and set MCP_PROFILE explicitly.


Security & Privacy

Path Restrictions

browser.evaluate (arbitrary JS execution) is disabled by default. Enable it only explicitly: MCP_ALLOW_EVALUATE=true

files.read_text and files.write_text are restricted to:

  • Read: Applied Jobs/, Auto/output/, Auto/logs/

  • Write: Auto/output/, Auto/logs/

Any attempt to read or write outside these paths throws immediately. Symlinks are resolved before checking (prevents traversal attacks).

What Is Stored

Data

Location

Git-ignored

Execution logs

logs/

✅ Yes

Extracted jobs/data

output/

✅ Yes

Session state exports

output/

✅ Yes

Gemini CLI state

scripts/.gemini/state.json

✅ Yes

.gemini/ config

root .gemini/

✅ Yes

What Is Never Stored

  • ❌ Passwords or credentials

  • ❌ Credit card or payment information

  • ❌ Browser history

  • ❌ Personal documents outside the allowed paths


Ethical Use

This tool is provided for:

  • Learning browser automation and MCP development

  • Testing your own web applications

  • Automating tasks on sites you have permission to access

  • Legitimate job searching and application workflows

You are responsible for:

  • Respecting robots.txt and website Terms of Service

  • Complying with data protection regulations (GDPR, CCPA, etc.)

  • Rate-limiting your requests to avoid service disruption

  • Not using this to bypass paywalls or access controls without authorization

The authors assume no liability for misuse. Use responsibly.


How This Differs from Microsoft's Official playwright-mcp

Microsoft's playwright-mcp focuses on accessibility-tree based automation for test development in structured environments.

Feature

Microsoft playwright-mcp

This project

Navigation

Accessibility tree

Hybrid: DOM + A11y + Visual

Philosophy

"Blind" automation (fast, structured)

Human-like automation (robust, adaptive)

Primary use case

QA testing, defined workflows

Open-web agents, scraping, complex UIs

Token efficiency

Not optimized

Capture profiles + hard payload budget

Session persistence

Basic

Cookie/storage export-import

Form intelligence

Manual

form_audit + fill_form + Google Forms specialist

Multi-tab

Basic

Full page manager with stable pageIds

Setup

Generic

Batteries included (stealth, profiles, launchers)

Use Microsoft's for: CI/CD test automation, structured accessibility-driven workflows Use this for: Autonomous agents operating on the open web, job application automation, anti-detection scraping


Changelog

v2.0.0 (Current)

  • Complete architectural rewrite: monolithic → 11 modular files

  • 71 MCP tools (was 23)

  • Capture profile system (light/balanced/full) for token efficiency

  • Hard 280KB payload budget with graceful truncation and retryWith hints

  • Multi-tab page manager (list, select, close pages)

  • A11y tree snapshots via CDP with stable ax- UIDs

  • CDP-native click/hover/scroll by backendDOMNodeId (handles Shadow DOM)

  • Form audit + intelligent fill + Google Forms specialist (6 tools)

  • Session export/import (cookie + localStorage persistence)

  • Popup, dialog, download event handling

  • Scroll awareness: get state, scroll by delta, scroll containers

  • Network + console observability via CDP

  • File reading: text files + PDF extraction

  • Security: path allowlist enforcement, evaluate guard

  • 18-test suite (was 2)

  • 7 profile launchers (was 5): added persist variants for CDP

  • GEMINI_CLI_MCP_* dual env var support for Gemini sanitization

v1.1.0

  • Profile launcher system (.bat files)

  • Chrome profile integration

  • --kill-chrome flag

  • One-shot mode with automatic logging

  • GEMINI_CLI_MCP_* environment variable aliases

  • browser.visual_snapshot and browser.click_at

v1.0.0

  • Initial release

  • Basic MCP server with Playwright

  • Indeed + Google extractors

  • DOM and visual navigation


Contributing

  1. Fork the repository

  2. Create a feature branch (git checkout -b feature/your-feature)

  3. Run npm run test:local to verify nothing breaks

  4. Commit (git commit -m 'Add your feature')

  5. Push and open a Pull Request


License

ISC License — see LICENSE file.


Acknowledgments


Support

Install Server
A
security – no known vulnerabilities
A
license - permissive license
A
quality - confirmed to work

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/Mhrnqaruni/mcp-playwright-browser'

If you have feedback or need assistance with the MCP directory API, please join our Discord server