The MCP Playwright Browser Server is a production-grade browser automation server that gives AI assistants full control over a web browser for tasks like web scraping, form filling, job searching, and complex multi-tab workflows.
Browser Control & Navigation
Launch Chromium/Chrome (headless or visible) with stealth/anti-detection modes, or connect via CDP for maximum stealth
Navigate to URLs, go back/forward, reload pages, and wait for selectors or timeouts
Manage multiple tabs: open, list, select, and close pages
Page Interaction
Click, type, fill, hover, and press keys using CSS selectors, visible text, element IDs, Accessibility Tree UIDs, or X/Y coordinates
Upload files and handle Shadow DOM elements via the Accessibility Tree
Page Reading & Data Extraction
Take plain-text snapshots (title, URL, text, links), visual screenshots, or A11y tree snapshots with stable UIDs and element bounding box maps
List interactive elements, extract text/HTML from selectors, and query the DOM
Specialized extractors for Indeed job listings (with pagination) and Google search results (with consent handling)
Scroll Control
Get scroll state for the main page or specific containers, scroll by delta or to absolute positions, and list all scrollable containers
Form Automation
Audit pages for unfilled required fields and intelligently fill forms using label-driven or selector-driven approaches, including Google Forms (dropdowns, checkboxes, radio buttons, grids)
Session Management
Export and import browser storage state (cookies + localStorage) to persist logins across runs
Event Handling
Handle JavaScript dialogs (alert/confirm/prompt), monitor and save file downloads, and wait for pop-up windows
Observability & Debugging
Capture console messages and log network requests for monitoring and debugging
File Operations
Read/write text files (restricted to allowed paths), extract text from PDFs, and list directory contents
Token Efficiency & Security
Configurable Capture Profile System (light/balanced/full) with a hard 280KB payload budget and graceful truncation to minimize token usage
Strict file path allowlist enforcement and optional gating of arbitrary JavaScript execution (
browser.evaluate)
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@MCP Playwright BrowserScrape Indeed for remote Python developer jobs and save the results"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
MCP Playwright Browser Server
A production-grade Model Context Protocol (MCP) server that gives AI assistants full browser control through Playwright — using a hybrid DOM + Accessibility Tree + Visual approach. Built for real-world agentic automation: job applications, web scraping, form filling, and complex multi-tab workflows.
v2.0 is a complete rewrite. The server grew from 680 lines and 23 tools to nearly 5,000 lines and 71 tools, with a modular architecture, token-optimized capture profiles, hard payload budgets, and a full test suite.
Table of Contents
What's New in v2.0
The Problem v1 Had
v1 was a working proof of concept. It could browse pages and extract jobs. But when used with Gemini CLI for real tasks — filling application forms, navigating multi-tab flows, handling downloads — it hit hard limits:
Token waste: Every tool response dumped everything it found. One
browser.snapshoton a complex page could push 50KB+ into Gemini's context window in a single call, rapidly exhausting the budget.No multi-tab support: If a link opened a new tab (very common in job applications), Gemini was stuck with no way to switch to it.
No form intelligence: Filling a form required manual click-by-click instructions. There was no way to ask "what fields are still empty?" or "fill all required fields."
Brittle DOM-only navigation: Shadow DOM, iframes, and obfuscated element IDs caused failures with no fallback.
No session persistence: Every run started fresh. Logging in again and again wasted time and triggered bot detection.
No safety rails: The AI could write files anywhere on disk, run arbitrary JS, or create its own automation scripts — unguarded.
Monolithic: One 680-line file with no tests.
What v2.0 Solves
Every one of those problems has a specific solution in v2.0:
Problem | v2.0 Solution |
Token waste | Capture Profile System (light/balanced/full) + 280KB hard payload ceiling |
Multi-tab stuck | Page Manager with stable pageIds, |
Dumb form filling |
|
Shadow DOM / obfuscated IDs | A11y tree via CDP |
Session loss | Cookie export/import, |
No safety | Path allowlist in |
Monolithic | 10 focused modules in |
v1 vs v2 Comparison
Dimension | v1.0 | v2.0 |
Total MCP tools | 23 | 71 |
Server size | 680 lines, 1 file | 4,966 lines, 11 modules |
Token efficiency | Uncontrolled dumps | Capture profiles + 280KB hard ceiling |
Multi-tab support | Single tab only | Full page manager (list, select, close) |
Form automation | Manual click-by-click |
|
A11y / Shadow DOM | DOM-only, brittle | CDP Accessibility tree with stable UIDs |
Scroll handling | Saw first viewport only | Scroll awareness + container scrolling |
Session persistence | None | Cookie/storage export-import |
Popup & dialog handling | None | Dialog accept/dismiss, popup pageId capture |
Download management | None | Wait-for-download, save to path |
File reading (CV/PDF) | None |
|
Security | No restrictions | Allowlist-enforced read/write paths |
Observability | None | Console log capture, network request log |
Test coverage | 2 tests | 18 tests |
Profiles | 3 | 5 (+ persistent variants) |
Batch scripts | 5 | 7 |
Error handling | Raw exceptions to AI | Normalized, structured, budgeted |
What stayed the same
Indeed job extractor (production-grade, multi-selector, deduplication)
Google search extractor (consent handling, URL deobfuscation)
Stealth mode (webdriver hiding, user agent spoofing)
CDP connection to real Chrome
Visual snapshot + coordinate-based clicking
How It Works
The Capture Ladder
Every profile instructs Gemini to try tools in order, cheapest first:
Gemini only escalates to a more expensive tool when the cheaper one doesn't have what it needs. This is the core of why v2.0 uses far fewer tokens than v1.0.
The Payload Budget
Every single tool response passes through enforcePayloadCeiling() before being sent to Gemini:
Measure response size in bytes
If under 280KB → send as-is
If over → progressively truncate: arrays shrink, strings truncate, fields drop
Always include
retryWithhints telling Gemini exactly what parameters to reduce next timeAbsolute floor:
{truncated: true}— Gemini never gets a context-crashing response
Quick Start
Installation
Prerequisites
Node.js 18+
npm
Gemini CLI:
npm install -g @google/gemini-clithengemini auth loginGoogle Chrome (for CDP and chrome-profile modes)
Setup
1. Install dependencies
2. Configure the MCP server path
Edit .gemini/settings.json and set cwd to your repo location:
3. (Optional) Disable Chrome background apps
Prevents profile locking:
4. Verify
Profile Launchers
Each .bat file pre-configures everything (browser type, stealth, profile, environment variables) and starts Gemini with the right system instructions. You never need to configure Gemini manually.
Available Profiles
Script | Browser | Mode | Best For |
| Chromium | Headless | ⚡ Bulk scraping, fastest |
| Chromium | Visible + Screenshots | Debugging, visual verification |
| Real Chrome | Your profile | Logged-in sessions, form filling |
| Real Chrome | CDP | Maximum stealth |
| Real Chrome | CDP + Visual | CDP with screenshot analysis |
| Real Chrome | CDP + Persistent | Long sessions, multi-step flows |
| Real Chrome | CDP + Visual + Persistent | Full power mode |
Interactive Mode (Chat)
One-Shot Mode (Automation)
Logs are auto-saved to logs/ with timestamps.
Profile Details
run-dom-headless.bat — Fastest
Chromium headless (no GUI)
Best for: bulk extraction, scraping, background tasks
Token usage: lowest (no screenshots)
run-visual-headful.bat — Debugging
Chromium with visible window
Screenshot-based navigation available
Best for: troubleshooting, visual verification
run-chrome-profile.bat — Authenticated Sessions
Real Chrome with your existing logged-in profile
Already signed into Gmail, LinkedIn, job sites
Use
--kill-chrometo free profile before startingBest for: job applications, authenticated scraping
run-cdp-profile.bat — Maximum Stealth
Connects to real Chrome via Chrome DevTools Protocol
Hardest for sites to detect as automation
Best for: sites that block Playwright/Chromium
Auto-closes any existing Chrome using the profile before launch
run-cdp-profile-persist.bat — Long Sessions
CDP mode with persistent browser (doesn't close between tasks)
Best for: multi-step workflows where browser state must survive
All 71 MCP Tools
Capture Profile Control
Tool | Description |
| Set |
| Show current profile settings and payload budget. |
Browser Lifecycle
Tool | Description |
| Launch Chromium with options: headless, stealth, userDataDir, profileDirectory, channel, slowMo, args |
| Launch real Chrome with remote debugging + connect in one step |
| Connect to existing Chrome with |
| Close browser session |
| Reload current page |
Multi-Tab Management
Tool | Description |
| Open new tab, tracked by page manager |
| List all open tabs with pageId, url, title, active/closed state |
| Switch active tab by pageId |
| Close a specific tab by pageId |
| List all iframes on the current page |
Navigation
Tool | Description |
| Navigate to URL with configurable waitUntil and timeout |
| Go back in history |
| Go forward in history |
| Wait for selector or fixed ms |
| Smart wait: selector, text, or uid (A11y) |
Event & Dialog Handling
Tool | Description |
| List pending JS dialogs (alert, confirm, prompt) |
| Accept or dismiss a dialog, optionally with input text |
| Block until a download starts, returns downloadId |
| Save a captured download to a specific path |
| Wait for a new tab/popup to open, returns its pageId |
| Listen for a one-time event: dialog, download, navigation, request, response |
Session & Cookie Management
Tool | Description |
| List cookies, optionally filtered by URL |
| Inject cookies into browser session |
| Clear all or URL-specific cookies |
| Export full session state (cookies + localStorage) to JSON file |
| Restore session from previously exported JSON |
Scroll Control
Tool | Description |
| Returns scrollY, scrollHeight, atTop, atBottom, viewport info |
| Scroll page by delta pixels (vertical + horizontal) |
| Scroll to absolute position |
| Detect all scrollable containers on the page |
| Scroll metrics for a specific container selector |
| Scroll a specific container by selector |
Page Reading & Snapshots
Tool | Description |
| Plain text page summary: title, text, links, optional headings + forms summary |
| A11y tree via CDP: roles, names, UIDs ( |
| Flexible selector query: text, value, bbox, visibility, state, tagName |
| Execute JavaScript (requires |
Element Interaction
Tool | Description |
| List visible interactive elements with elementId, tag, text, href |
| Click by elementId, uid, selector, or text |
| Hover over element (triggers dropdown menus, tooltips) |
| Simulate keypress-by-keypress typing |
| Direct value fill (faster, no keypress simulation) |
| Press keyboard key (Enter, Tab, Escape, etc.) |
| Upload file to input[type=file] |
| Scroll a UID element into view |
Visual Navigation
Tool | Description |
| Save screenshot to path |
| Screenshot + element map with bounding boxes and IDs |
| Click at viewport-relative X/Y coordinates |
| Click at document-absolute X/Y coordinates |
Data Extraction
Tool | Description |
| Extract text from CSS selector (single or all matches) |
| Extract outerHTML from selector |
Form Automation
Tool | Description |
| Scan page for all unfilled required fields: text, select, radio, checkbox, contenteditable |
| Fill a list of |
| Google Forms specialist: list all questions and check |
| Fill a Google Forms text question by question text |
| Select option in Google Forms dropdown |
| Check/uncheck Google Forms checkbox |
| Select option in Google Forms radio group |
| Select option in Google Forms grid question |
Observability
Tool | Description |
| Show captured |
| Show all network requests (URL, method, status, timing) |
| Get full details for a specific request by ID |
File Operations
Tool | Description |
| Read text file (restricted to allowed paths) |
| Extract text from PDF — used to read CV files |
| List directory contents |
| Write text to file (restricted to |
Specialized Extractors (Production Examples)
Tool | Description |
| Extract Indeed job listings with multi-selector fallbacks, deduplication, access detection |
| Navigate to next Indeed page (direct URL, click, or auto mode) |
| Open Google search and extract results with consent handling |
| Extract results from current Google search page |
Architecture
Module Structure
Tool Registration Middleware
Every tool goes through a wrapper that runs before and after the handler:
This means every tool automatically benefits from error safety and payload budgeting without any extra code per tool.
UID System
The A11y snapshot (browser.take_snapshot) assigns every node a stable UID in the format ax-{nodeId}, tied to the CDP backendDOMNodeId. This UID can then be used with:
browser.click({ uid: "ax-123" })— clicks via CDP directly on the backend nodebrowser.scroll_to_uid({ uid: "ax-123" })— scrolls it into view firstbrowser.wait_for({ uid: "ax-123" })— waits until it's visible
CDP-native clicks are more reliable than selector-based clicks because they bypass CSS selector resolution and work even in Shadow DOM.
Token Efficiency: Capture Profiles
This is the most important v2.0 feature for real-world use.
The Problem
AI context windows are finite. Every tool response consumes tokens. A naive implementation that dumps everything on every call quickly exhausts the budget.
The Solution: Three Profiles
Set the profile once at session start, and every subsequent tool call automatically uses appropriate limits:
Profile | Snapshot chars | List items | A11y nodes | Best For |
light | 6,000–9,000 | 120–180 | 220–320 | Job scraping, bulk tasks |
balanced | 12,000–16,000 | 240–320 | 440–700 | Form filling, research |
full | 20,000 | 500 | 1,200–2,000 | Deep debugging only |
Two Detail Levels Per Profile
Within each profile, tools accept detail: "low" or detail: "high":
The Capture Ladder in Practice
The profile system instructions teach Gemini to escalate only when needed:
In light mode, this entire ladder costs roughly 8x fewer tokens than v1.0's single dump approach.
Hard Payload Budget
Even with capture profiles, some pages are just huge. The payload budget is a safety net:
Default ceiling: 280KB per response
If exceeded: truncate progressively (arrays → strings → object keys)
Include
retryWithfield:{ detail: "low", maxItems: 80, limit: 20 }Gemini reads this and retries with smaller parameters
Absolute fallback:
{ truncated: true, truncationReason: "..." }
The budget is configurable: MCP_MAX_RESPONSE_BYTES=150000 for tighter contexts.
Common Use Cases
Job Application (Chrome Profile)
In Gemini:
Bulk Job Scraping (Headless)
Session Persistence (Login Once, Reuse)
In Gemini:
Next time:
Google Form Automation
In Gemini:
PDF CV Reading
Gemini can read your CV directly without you pasting it:
Debugging with Visual Mode
In Gemini:
Environment Variables
All variables have dual names for Gemini CLI compatibility. The launchers set both:
Variable | Alias | Description |
|
| true/false — run without GUI |
|
| true/false — enable anti-detection |
|
|
|
|
| Absolute path to chrome.exe |
|
| Chrome profile directory |
|
| Profile name: |
|
| CDP URL: |
|
| CDP port number (default 9222) |
|
| Close Chrome on server exit |
|
| Disable |
|
| Require userDataDir (prevent bare Chromium) |
|
| Enable |
|
| Comma-separated allowed origins for evaluate |
|
| Default profile: |
|
| Override 280KB payload ceiling |
|
| Slow down actions by N ms (debugging) |
Why dual names? Gemini CLI sanitizes environment variables and may strip MCP_* prefixed keys. The GEMINI_CLI_MCP_* variants bypass this filtering. The server reads both and uses whichever is set.
Project Structure
Running Tests
Troubleshooting
"Chrome is already running" / Profile locked
Chrome 136+ blocks automation on the default User Data directory. Always use a dedicated profile or the ChromeForMCP data dir.
"Gmail says browser is not safe"
You're connected via Chromium, not your real Chrome. Ensure:
Chrome is fully closed before starting (
--kill-chrome)The launch response shows
"persistent": trueand your profile pathIf not, restart Gemini and verify
.batoutputsUsing Chrome executable: ...
MCP tools not found in Gemini
Run any
.batfrom any directory — they auto-fixcwdVerify
.gemini/settings.jsonhas the correctcwdThe
scripts/.gemini/settings.jsonis a fallback if Gemini starts inscripts/
Responses truncated / retryWith hint
This is the payload budget working correctly. Gemini will read the retryWith hint and retry with lower parameters. If it keeps happening, switch to light profile:
Slow performance
Use
run-dom-headless.batfor bulk operations (no GUI = 3-4x faster)Avoid
browser.extract_html— it returns full HTML and wastes tokensUse
detail: "low"on all tools unless you specifically need more
Browser opens but ignores my profile
Check .bat output for:
If you see a different profile or "not found", edit the .bat and set MCP_PROFILE explicitly.
Security & Privacy
Path Restrictions
browser.evaluate (arbitrary JS execution) is disabled by default. Enable it only explicitly: MCP_ALLOW_EVALUATE=true
files.read_text and files.write_text are restricted to:
Read:
Applied Jobs/,Auto/output/,Auto/logs/Write:
Auto/output/,Auto/logs/
Any attempt to read or write outside these paths throws immediately. Symlinks are resolved before checking (prevents traversal attacks).
What Is Stored
Data | Location | Git-ignored |
Execution logs |
| ✅ Yes |
Extracted jobs/data |
| ✅ Yes |
Session state exports |
| ✅ Yes |
Gemini CLI state |
| ✅ Yes |
| root | ✅ Yes |
What Is Never Stored
❌ Passwords or credentials
❌ Credit card or payment information
❌ Browser history
❌ Personal documents outside the allowed paths
Ethical Use
This tool is provided for:
Learning browser automation and MCP development
Testing your own web applications
Automating tasks on sites you have permission to access
Legitimate job searching and application workflows
You are responsible for:
Respecting
robots.txtand website Terms of ServiceComplying with data protection regulations (GDPR, CCPA, etc.)
Rate-limiting your requests to avoid service disruption
Not using this to bypass paywalls or access controls without authorization
The authors assume no liability for misuse. Use responsibly.
How This Differs from Microsoft's Official playwright-mcp
Microsoft's playwright-mcp focuses on accessibility-tree based automation for test development in structured environments.
Feature | Microsoft | This project |
Navigation | Accessibility tree | Hybrid: DOM + A11y + Visual |
Philosophy | "Blind" automation (fast, structured) | Human-like automation (robust, adaptive) |
Primary use case | QA testing, defined workflows | Open-web agents, scraping, complex UIs |
Token efficiency | Not optimized | Capture profiles + hard payload budget |
Session persistence | Basic | Cookie/storage export-import |
Form intelligence | Manual |
|
Multi-tab | Basic | Full page manager with stable pageIds |
Setup | Generic | Batteries included (stealth, profiles, launchers) |
Use Microsoft's for: CI/CD test automation, structured accessibility-driven workflows Use this for: Autonomous agents operating on the open web, job application automation, anti-detection scraping
Changelog
v2.0.0 (Current)
Complete architectural rewrite: monolithic → 11 modular files
71 MCP tools (was 23)
Capture profile system (light/balanced/full) for token efficiency
Hard 280KB payload budget with graceful truncation and
retryWithhintsMulti-tab page manager (list, select, close pages)
A11y tree snapshots via CDP with stable
ax-UIDsCDP-native click/hover/scroll by backendDOMNodeId (handles Shadow DOM)
Form audit + intelligent fill + Google Forms specialist (6 tools)
Session export/import (cookie + localStorage persistence)
Popup, dialog, download event handling
Scroll awareness: get state, scroll by delta, scroll containers
Network + console observability via CDP
File reading: text files + PDF extraction
Security: path allowlist enforcement, evaluate guard
18-test suite (was 2)
7 profile launchers (was 5): added persist variants for CDP
GEMINI_CLI_MCP_* dual env var support for Gemini sanitization
v1.1.0
Profile launcher system (.bat files)
Chrome profile integration
--kill-chromeflagOne-shot mode with automatic logging
GEMINI_CLI_MCP_* environment variable aliases
browser.visual_snapshotandbrowser.click_at
v1.0.0
Initial release
Basic MCP server with Playwright
Indeed + Google extractors
DOM and visual navigation
Contributing
Fork the repository
Create a feature branch (
git checkout -b feature/your-feature)Run
npm run test:localto verify nothing breaksCommit (
git commit -m 'Add your feature')Push and open a Pull Request
License
ISC License — see LICENSE file.
Acknowledgments
Playwright — browser automation backbone
Model Context Protocol — AI tool interface
Microsoft playwright-mcp — inspiration for the A11y approach
Support
Issues: GitHub Issues
Discussions: GitHub Discussions