MCP Playwright Browser

README.md•35.5 KiB

# MCP Playwright Browser Server A production-grade **Model Context Protocol (MCP) server** that gives AI assistants full browser control through Playwright — using a hybrid DOM + Accessibility Tree + Visual approach. Built for real-world agentic automation: job applications, web scraping, form filling, and complex multi-tab workflows. > **v2.0 is a complete rewrite.** The server grew from 680 lines and 23 tools to nearly 5,000 lines and 71 tools, with a modular architecture, token-optimized capture profiles, hard payload budgets, and a full test suite. --- ## Table of Contents - [What's New in v2.0](#whats-new-in-v20) - [v1 vs v2 Comparison](#v1-vs-v2-comparison) - [How It Works](#how-it-works) - [Quick Start](#quick-start) - [Installation](#installation) - [Profile Launchers](#profile-launchers) - [All 71 MCP Tools](#all-71-mcp-tools) - [Architecture](#architecture) - [Token Efficiency: Capture Profiles](#token-efficiency-capture-profiles) - [Common Use Cases](#common-use-cases) - [Environment Variables](#environment-variables) - [Project Structure](#project-structure) - [Troubleshooting](#troubleshooting) - [Security & Privacy](#security--privacy) - [Ethical Use](#ethical-use) - [License](#license) --- ## What's New in v2.0 ### The Problem v1 Had v1 was a working proof of concept. It could browse pages and extract jobs. But when used with Gemini CLI for real tasks — filling application forms, navigating multi-tab flows, handling downloads — it hit hard limits: - **Token waste**: Every tool response dumped everything it found. One `browser.snapshot` on a complex page could push 50KB+ into Gemini's context window in a single call, rapidly exhausting the budget. - **No multi-tab support**: If a link opened a new tab (very common in job applications), Gemini was stuck with no way to switch to it. - **No form intelligence**: Filling a form required manual click-by-click instructions. There was no way to ask "what fields are still empty?" or "fill all required fields." - **Brittle DOM-only navigation**: Shadow DOM, iframes, and obfuscated element IDs caused failures with no fallback. - **No session persistence**: Every run started fresh. Logging in again and again wasted time and triggered bot detection. - **No safety rails**: The AI could write files anywhere on disk, run arbitrary JS, or create its own automation scripts — unguarded. - **Monolithic**: One 680-line file with no tests. ### What v2.0 Solves Every one of those problems has a specific solution in v2.0: | Problem | v2.0 Solution | |---------|--------------| | Token waste | Capture Profile System (light/balanced/full) + 280KB hard payload ceiling | | Multi-tab stuck | Page Manager with stable pageIds, `browser.list_pages`, `browser.select_page` | | Dumb form filling | `browser.form_audit` + `browser.fill_form` + Google Forms specialist tools | | Shadow DOM / obfuscated IDs | A11y tree via CDP `Accessibility.getFullAXTree` with stable `ax-` UIDs | | Session loss | Cookie export/import, `browser.export_storage_state` / `browser.import_storage_state` | | No safety | Path allowlist in `src/security/paths.js`, `MCP_ALLOW_EVALUATE` guard | | Monolithic | 10 focused modules in `src/browser/` + `src/security/` + 18-test suite | --- ## v1 vs v2 Comparison | Dimension | v1.0 | v2.0 | |-----------|------|------| | **Total MCP tools** | 23 | **71** | | **Server size** | 680 lines, 1 file | 4,966 lines, 11 modules | | **Token efficiency** | Uncontrolled dumps | Capture profiles + 280KB hard ceiling | | **Multi-tab support** | Single tab only | Full page manager (list, select, close) | | **Form automation** | Manual click-by-click | `form_audit` + `fill_form` + Google Forms specialist | | **A11y / Shadow DOM** | DOM-only, brittle | CDP Accessibility tree with stable UIDs | | **Scroll handling** | Saw first viewport only | Scroll awareness + container scrolling | | **Session persistence** | None | Cookie/storage export-import | | **Popup & dialog handling** | None | Dialog accept/dismiss, popup pageId capture | | **Download management** | None | Wait-for-download, save to path | | **File reading (CV/PDF)** | None | `files.read_text`, `files.read_pdf_text` | | **Security** | No restrictions | Allowlist-enforced read/write paths | | **Observability** | None | Console log capture, network request log | | **Test coverage** | 2 tests | **18 tests** | | **Profiles** | 3 | 5 (+ persistent variants) | | **Batch scripts** | 5 `.bat` launchers | 7 `.bat` launchers | | **Error handling** | Raw exceptions to AI | Normalized, structured, budgeted | ### What stayed the same - Indeed job extractor (production-grade, multi-selector, deduplication) - Google search extractor (consent handling, URL deobfuscation) - Stealth mode (webdriver hiding, user agent spoofing) - CDP connection to real Chrome - Visual snapshot + coordinate-based clicking --- ## How It Works ``` You / Gemini CLI │ │ natural language prompt ▼ Gemini CLI ──── loads MCP config ────► playwrightBrowser MCP server │ ┌────────────────┤ │ │ 71 MCP Tools Payload Budget (browser.*) (280KB ceiling) (forms.*) (capture profiles) (files.*) (retryWith hints) (jobs.*) (search.*) │ ┌─────────┤──────────┐ │ │ │ Playwright CDP API Security (browser) (A11y, (path network, allowlist) clicks) │ Chrome / Chromium ``` ### The Capture Ladder Every profile instructs Gemini to try tools in order, cheapest first: ``` 1. browser.snapshot → plain text summary (cheapest, ~6KB in light mode) 2. browser.list → interactive elements (structured, ~8KB) 3. browser.query_dom → targeted selector query (focused, ~10KB) 4. browser.take_snapshot→ A11y tree with UIDs (rich, only when uid-clicking needed) 5. browser.visual_snapshot → screenshot + bbox map (most expensive, last resort) ``` Gemini only escalates to a more expensive tool when the cheaper one doesn't have what it needs. This is the core of why v2.0 uses far fewer tokens than v1.0. ### The Payload Budget Every single tool response passes through `enforcePayloadCeiling()` before being sent to Gemini: 1. Measure response size in bytes 2. If under 280KB → send as-is 3. If over → progressively truncate: arrays shrink, strings truncate, fields drop 4. Always include `retryWith` hints telling Gemini exactly what parameters to reduce next time 5. Absolute floor: `{truncated: true}` — Gemini never gets a context-crashing response --- ## Quick Start ```bash # Clone git clone https://github.com/Mhrnqaruni/mcp-playwright-browser.git cd mcp-playwright-browser # Install npm install npx playwright install chromium # Run (interactive mode - chat with Gemini) scripts\run-dom-headless.bat # Run (one-shot automation) scripts\run-dom-headless.bat -p "Go to https://example.com and extract the page title" # Run with real Chrome (for logged-in sessions) scripts\run-chrome-profile.bat --kill-chrome ``` --- ## Installation ### Prerequisites - **Node.js 18+** - **npm** - **Gemini CLI**: `npm install -g @google/gemini-cli` then `gemini auth login` - **Google Chrome** (for CDP and chrome-profile modes) ### Setup **1. Install dependencies** ```bash npm install npx playwright install chromium ``` **2. Configure the MCP server path** Edit `.gemini/settings.json` and set `cwd` to your repo location: ```json { "mcpServers": { "playwrightBrowser": { "command": "node", "args": ["src/mcp-browser-server.js"], "cwd": "C:/path/to/mcp-playwright-browser" } } } ``` **3. (Optional) Disable Chrome background apps** Prevents profile locking: ``` Chrome Settings → Advanced → System → ☐ Continue running background apps when Google Chrome is closed ``` **4. Verify** ```bash scripts\run-dom-headless.bat -p "Use MCP server playwrightBrowser. Launch browser. Go to https://example.com. Take a snapshot. Close." ``` --- ## Profile Launchers Each `.bat` file pre-configures everything (browser type, stealth, profile, environment variables) and starts Gemini with the right system instructions. You never need to configure Gemini manually. ### Available Profiles | Script | Browser | Mode | Best For | |--------|---------|------|----------| | `run-dom-headless.bat` | Chromium | Headless | ⚡ Bulk scraping, fastest | | `run-visual-headful.bat` | Chromium | Visible + Screenshots | Debugging, visual verification | | `run-chrome-profile.bat` | Real Chrome | Your profile | Logged-in sessions, form filling | | `run-cdp-profile.bat` | Real Chrome | CDP | Maximum stealth | | `run-cdp-profile-screen.bat` | Real Chrome | CDP + Visual | CDP with screenshot analysis | | `run-cdp-profile-persist.bat` | Real Chrome | CDP + Persistent | Long sessions, multi-step flows | | `run-cdp-profile-screen-persist.bat` | Real Chrome | CDP + Visual + Persistent | Full power mode | ### Interactive Mode (Chat) ```bash # Start Gemini and chat with it scripts\run-chrome-profile.bat --kill-chrome # Then just type: # "Fill out the job application at [URL] using my CV" # "Go to LinkedIn and apply to the first 5 jobs" # "Extract all AI engineer jobs from Indeed and save them" ``` ### One-Shot Mode (Automation) ```bash # Run a task and get a log file scripts\run-dom-headless.bat -p "Your full task here" # With custom output scripts\run-dom-headless.bat -p "Extract 50 jobs from Indeed" --output logs\jobs.log # Chrome profile one-shot scripts\run-chrome-profile.bat --kill-chrome -p "Submit application at [URL]" --output logs\apply.log ``` Logs are auto-saved to `logs/` with timestamps. ### Profile Details #### `run-dom-headless.bat` — Fastest - Chromium headless (no GUI) - Best for: bulk extraction, scraping, background tasks - Token usage: lowest (no screenshots) #### `run-visual-headful.bat` — Debugging - Chromium with visible window - Screenshot-based navigation available - Best for: troubleshooting, visual verification #### `run-chrome-profile.bat` — Authenticated Sessions - Real Chrome with your existing logged-in profile - Already signed into Gmail, LinkedIn, job sites - Use `--kill-chrome` to free profile before starting - Best for: job applications, authenticated scraping #### `run-cdp-profile.bat` — Maximum Stealth - Connects to real Chrome via Chrome DevTools Protocol - Hardest for sites to detect as automation - Best for: sites that block Playwright/Chromium - Auto-closes any existing Chrome using the profile before launch #### `run-cdp-profile-persist.bat` — Long Sessions - CDP mode with persistent browser (doesn't close between tasks) - Best for: multi-step workflows where browser state must survive --- ## All 71 MCP Tools ### Capture Profile Control | Tool | Description | |------|-------------| | `browser.set_capture_profile` | Set `light` / `balanced` / `full` profile. Controls token usage across all tools. Call this first. | | `browser.get_capture_profile` | Show current profile settings and payload budget. | ### Browser Lifecycle | Tool | Description | |------|-------------| | `browser.launch` | Launch Chromium with options: headless, stealth, userDataDir, profileDirectory, channel, slowMo, args | | `browser.launch_chrome_cdp` | Launch real Chrome with remote debugging + connect in one step | | `browser.connect_cdp` | Connect to existing Chrome with `--remote-debugging-port` | | `browser.close` | Close browser session | | `browser.reload` | Reload current page | ### Multi-Tab Management | Tool | Description | |------|-------------| | `browser.new_page` | Open new tab, tracked by page manager | | `browser.list_pages` | List all open tabs with pageId, url, title, active/closed state | | `browser.select_page` | Switch active tab by pageId | | `browser.close_page` | Close a specific tab by pageId | | `browser.list_frames` | List all iframes on the current page | ### Navigation | Tool | Description | |------|-------------| | `browser.goto` | Navigate to URL with configurable waitUntil and timeout | | `browser.back` | Go back in history | | `browser.forward` | Go forward in history | | `browser.wait` | Wait for selector or fixed ms | | `browser.wait_for` | Smart wait: selector, text, or uid (A11y) | ### Event & Dialog Handling | Tool | Description | |------|-------------| | `browser.list_dialogs` | List pending JS dialogs (alert, confirm, prompt) | | `browser.handle_dialog` | Accept or dismiss a dialog, optionally with input text | | `browser.wait_for_download` | Block until a download starts, returns downloadId | | `browser.save_download` | Save a captured download to a specific path | | `browser.wait_for_popup` | Wait for a new tab/popup to open, returns its pageId | | `browser.expect_event` | Listen for a one-time event: dialog, download, navigation, request, response | ### Session & Cookie Management | Tool | Description | |------|-------------| | `browser.get_cookies` | List cookies, optionally filtered by URL | | `browser.set_cookies` | Inject cookies into browser session | | `browser.clear_cookies` | Clear all or URL-specific cookies | | `browser.export_storage_state` | Export full session state (cookies + localStorage) to JSON file | | `browser.import_storage_state` | Restore session from previously exported JSON | ### Scroll Control | Tool | Description | |------|-------------| | `browser.get_scroll_state` | Returns scrollY, scrollHeight, atTop, atBottom, viewport info | | `browser.scroll_by` | Scroll page by delta pixels (vertical + horizontal) | | `browser.scroll_to` | Scroll to absolute position | | `browser.get_scrollables` | Detect all scrollable containers on the page | | `browser.get_container_scroll_state` | Scroll metrics for a specific container selector | | `browser.scroll_container` | Scroll a specific container by selector | ### Page Reading & Snapshots | Tool | Description | |------|-------------| | `browser.snapshot` | Plain text page summary: title, text, links, optional headings + forms summary | | `browser.take_snapshot` | A11y tree via CDP: roles, names, UIDs (`ax-{nodeId}`), depth, state | | `browser.query_dom` | Flexible selector query: text, value, bbox, visibility, state, tagName | | `browser.evaluate` | Execute JavaScript (requires `MCP_ALLOW_EVALUATE=true`, origin-gated) | ### Element Interaction | Tool | Description | |------|-------------| | `browser.list` | List visible interactive elements with elementId, tag, text, href | | `browser.click` | Click by elementId, uid, selector, or text | | `browser.hover` | Hover over element (triggers dropdown menus, tooltips) | | `browser.type` | Simulate keypress-by-keypress typing | | `browser.fill` | Direct value fill (faster, no keypress simulation) | | `browser.press` | Press keyboard key (Enter, Tab, Escape, etc.) | | `browser.set_input_files` | Upload file to input[type=file] | | `browser.scroll_to_uid` | Scroll a UID element into view | ### Visual Navigation | Tool | Description | |------|-------------| | `browser.screenshot` | Save screenshot to path | | `browser.visual_snapshot` | Screenshot + element map with bounding boxes and IDs | | `browser.click_at` | Click at viewport-relative X/Y coordinates | | `browser.click_at_page` | Click at document-absolute X/Y coordinates | ### Data Extraction | Tool | Description | |------|-------------| | `browser.extract_text` | Extract text from CSS selector (single or all matches) | | `browser.extract_html` | Extract outerHTML from selector | ### Form Automation | Tool | Description | |------|-------------| | `browser.form_audit` | Scan page for all unfilled required fields: text, select, radio, checkbox, contenteditable | | `browser.fill_form` | Fill a list of `{label, selector, value, kind}` fields — label-driven or selector-driven | | `forms.google_audit` | Google Forms specialist: list all questions and check `aria-checked` for answers | | `forms.google_set_text` | Fill a Google Forms text question by question text | | `forms.google_set_dropdown` | Select option in Google Forms dropdown | | `forms.google_set_checkbox` | Check/uncheck Google Forms checkbox | | `forms.google_set_radio` | Select option in Google Forms radio group | | `forms.google_set_grid` | Select option in Google Forms grid question | ### Observability | Tool | Description | |------|-------------| | `browser.list_console_messages` | Show captured `console.log/warn/error` from the page | | `browser.list_network_requests` | Show all network requests (URL, method, status, timing) | | `browser.get_network_request` | Get full details for a specific request by ID | ### File Operations | Tool | Description | |------|-------------| | `files.read_text` | Read text file (restricted to allowed paths) | | `files.read_pdf_text` | Extract text from PDF — used to read CV files | | `files.list_dir` | List directory contents | | `files.write_text` | Write text to file (restricted to `output/` and `logs/`) | ### Specialized Extractors (Production Examples) | Tool | Description | |------|-------------| | `jobs.extract_indeed` | Extract Indeed job listings with multi-selector fallbacks, deduplication, access detection | | `jobs.indeed_next_page` | Navigate to next Indeed page (direct URL, click, or auto mode) | | `search.google` | Open Google search and extract results with consent handling | | `search.extract_google` | Extract results from current Google search page | --- ## Architecture ### Module Structure ``` src/ ├── mcp-browser-server.js # Main server: tool registration, env config, middleware ├── extractors.js # Indeed + Google specialized extractors ├── browser/ │ ├── pages.js # Multi-tab page manager (stable pageIds) │ ├── snapshot.js # A11y tree via CDP Accessibility.getFullAXTree │ ├── capture-profiles.js # light/balanced/full × low/high = 30 preset configs │ ├── payload-budget.js # Hard 280KB response ceiling with graceful truncation │ ├── cdp.js # CDP session, click/hover/scroll by backendNodeId │ ├── dom-version.js # DOM mutation tracking, frame management │ ├── forms.js # Form audit + intelligent form fill │ ├── observability.js # Console + network request capture via CDP │ └── wait.js # Smart wait: selector, text, uid └── security/ └── paths.js # Read/write path allowlist enforcement ``` ### Tool Registration Middleware Every tool goes through a wrapper that runs before and after the handler: ``` AI calls tool │ ▼ assign requestId │ ▼ run handler │ ▼ normalize errors (structured, no stack traces) │ ▼ add envelope (ok, requestId, timestamp, url, domVersion) │ ▼ enforcePayloadCeiling (truncate if > 280KB) │ ▼ send to AI ``` This means every tool automatically benefits from error safety and payload budgeting without any extra code per tool. ### UID System The A11y snapshot (`browser.take_snapshot`) assigns every node a stable UID in the format `ax-{nodeId}`, tied to the CDP `backendDOMNodeId`. This UID can then be used with: - `browser.click({ uid: "ax-123" })` — clicks via CDP directly on the backend node - `browser.scroll_to_uid({ uid: "ax-123" })` — scrolls it into view first - `browser.wait_for({ uid: "ax-123" })` — waits until it's visible CDP-native clicks are more reliable than selector-based clicks because they bypass CSS selector resolution and work even in Shadow DOM. --- ## Token Efficiency: Capture Profiles This is the most important v2.0 feature for real-world use. ### The Problem AI context windows are finite. Every tool response consumes tokens. A naive implementation that dumps everything on every call quickly exhausts the budget. ### The Solution: Three Profiles Set the profile once at session start, and every subsequent tool call automatically uses appropriate limits: ``` browser.set_capture_profile({ profile: "light" }) ``` | Profile | Snapshot chars | List items | A11y nodes | Best For | |---------|---------------|------------|------------|----------| | **light** | 6,000–9,000 | 120–180 | 220–320 | Job scraping, bulk tasks | | **balanced** | 12,000–16,000 | 240–320 | 440–700 | Form filling, research | | **full** | 20,000 | 500 | 1,200–2,000 | Deep debugging only | ### Two Detail Levels Per Profile Within each profile, tools accept `detail: "low"` or `detail: "high"`: ``` browser.snapshot({ detail: "low" }) # minimal, fast browser.snapshot({ detail: "high" }) # more text, links, headings, form summary ``` ### The Capture Ladder in Practice The profile system instructions teach Gemini to escalate only when needed: ``` ✅ "I need to find the Apply button" → browser.snapshot (low) # did I find it in plain text? usually yes → browser.list (low) # still looking? check interactive elements → browser.take_snapshot (low) # need uid for reliable click? A11y tree → browser.visual_snapshot (low) # shadow DOM / can't find it at all? visual fallback ``` In `light` mode, this entire ladder costs roughly 8x fewer tokens than v1.0's single dump approach. ### Hard Payload Budget Even with capture profiles, some pages are just huge. The payload budget is a safety net: - Default ceiling: **280KB per response** - If exceeded: truncate progressively (arrays → strings → object keys) - Include `retryWith` field: `{ detail: "low", maxItems: 80, limit: 20 }` - Gemini reads this and retries with smaller parameters - Absolute fallback: `{ truncated: true, truncationReason: "..." }` The budget is configurable: `MCP_MAX_RESPONSE_BYTES=150000` for tighter contexts. --- ## Common Use Cases ### Job Application (Chrome Profile) ```bash # Start with your real logged-in Chrome scripts\run-chrome-profile.bat --kill-chrome ``` In Gemini: ``` Set capture profile to light. Go to [application URL]. Run form_audit to see all required fields. Fill them using fill_form with my details from Applied Jobs/CODEX/maincv.md. Before submitting, take a screenshot and ask me to confirm. ``` ### Bulk Job Scraping (Headless) ```bash scripts\run-dom-headless.bat -p "Use playwrightBrowser. Launch browser headless. Go to https://ae.indeed.com/q-ai-engineer-l-dubai-jobs.html. Extract jobs with jobs.extract_indeed limit 20, save to output/indeed/page-1. Go to next page with jobs.indeed_next_page. Extract again, save to output/indeed/page-2. Close." ``` ### Session Persistence (Login Once, Reuse) ```bash # First time: login manually and export session scripts\run-cdp-profile.bat ``` In Gemini: ``` Go to linkedin.com and wait for me to log in. After I confirm logged in, run browser.export_storage_state to output/linkedin-session.json. ``` Next time: ``` Run browser.import_storage_state from output/linkedin-session.json. Go to linkedin.com — should be logged in already. ``` ### Google Form Automation ```bash scripts\run-dom-headless.bat ``` In Gemini: ``` Go to [Google Form URL]. Run forms.google_audit to see all questions. Fill each question using the appropriate forms.google_set_* tool. Run forms.google_audit again to verify all answered. Submit. ``` ### PDF CV Reading Gemini can read your CV directly without you pasting it: ``` Read my CV from Applied Jobs/CODEX/maincv.md using files.read_text. Or read the PDF version: files.read_pdf_text from Applied Jobs/CODEX/CV.pdf. Use that information to fill the job application form. ``` ### Debugging with Visual Mode ```bash scripts\run-visual-headful.bat ``` In Gemini: ``` Go to [URL]. Take a visual_snapshot and save to output/debug.png. Tell me what you see and identify any unusual elements. ``` --- ## Environment Variables All variables have dual names for Gemini CLI compatibility. The launchers set both: | Variable | Alias | Description | |----------|-------|-------------| | `MCP_HEADLESS` | `GEMINI_CLI_MCP_HEADLESS` | true/false — run without GUI | | `MCP_STEALTH` | `GEMINI_CLI_MCP_STEALTH` | true/false — enable anti-detection | | `MCP_CHANNEL` | `GEMINI_CLI_MCP_CHANNEL` | `chrome` — use real Chrome | | `MCP_EXECUTABLE_PATH` | `GEMINI_CLI_MCP_EXECUTABLE_PATH` | Absolute path to chrome.exe | | `MCP_USER_DATA_DIR` | `GEMINI_CLI_MCP_USER_DATA_DIR` | Chrome profile directory | | `MCP_PROFILE` | `GEMINI_CLI_MCP_PROFILE` | Profile name: `Default`, `Profile 3` | | `MCP_CDP_ENDPOINT` | `GEMINI_CLI_MCP_CDP_ENDPOINT` | CDP URL: `http://127.0.0.1:9222` | | `MCP_CDP_PORT` | `GEMINI_CLI_MCP_CDP_PORT` | CDP port number (default 9222) | | `MCP_CDP_AUTO_CLOSE` | `GEMINI_CLI_MCP_CDP_AUTO_CLOSE` | Close Chrome on server exit | | `MCP_FORCE_CDP` | `GEMINI_CLI_MCP_FORCE_CDP` | Disable `browser.launch` (CDP-only mode) | | `MCP_REQUIRE_PROFILE` | `GEMINI_CLI_MCP_REQUIRE_PROFILE` | Require userDataDir (prevent bare Chromium) | | `MCP_ALLOW_EVALUATE` | `GEMINI_CLI_MCP_ALLOW_EVALUATE` | Enable `browser.evaluate` tool | | `MCP_EVALUATE_ALLOW_ORIGINS` | `GEMINI_CLI_MCP_EVALUATE_ALLOW_ORIGINS` | Comma-separated allowed origins for evaluate | | `MCP_CAPTURE_PROFILE` | `GEMINI_CLI_MCP_CAPTURE_PROFILE` | Default profile: `light`, `balanced`, `full` | | `MCP_MAX_RESPONSE_BYTES` | `GEMINI_CLI_MCP_MAX_RESPONSE_BYTES` | Override 280KB payload ceiling | | `MCP_SLOWMO_MS` | `GEMINI_CLI_MCP_SLOWMO_MS` | Slow down actions by N ms (debugging) | **Why dual names?** Gemini CLI sanitizes environment variables and may strip `MCP_*` prefixed keys. The `GEMINI_CLI_MCP_*` variants bypass this filtering. The server reads both and uses whichever is set. --- ## Project Structure ``` mcp-playwright-browser/ │ ├── src/ │ ├── mcp-browser-server.js # Main server (71 tools, middleware, env config) │ ├── extractors.js # Indeed + Google production extractors │ ├── browser/ │ │ ├── pages.js # Multi-tab page manager │ │ ├── snapshot.js # A11y tree (CDP Accessibility API) │ │ ├── capture-profiles.js # Token budget profiles (light/balanced/full) │ │ ├── payload-budget.js # Hard response size ceiling │ │ ├── cdp.js # CDP primitives (click, hover, scroll by nodeId) │ │ ├── dom-version.js # DOM mutation tracking + frame management │ │ ├── forms.js # Form audit + intelligent fill │ │ ├── observability.js # Console + network capture │ │ └── wait.js # Smart wait (selector, text, uid) │ ├── security/ │ │ └── paths.js # File read/write path allowlist │ └── tests/ │ ├── page-manager-test.js │ ├── security-paths-test.js │ ├── snapshot-uid-test.js │ ├── uid-click-fill-test.js │ ├── elementid-no-stale-test.js │ ├── wait-for-test.js │ ├── form-audit-fill-test.js │ ├── console-network-test.js │ ├── visual-coords-test.js │ ├── frame-domversion-test.js │ ├── cdp-hover-test.js │ ├── browser-events-test.js │ ├── storage-state-test.js │ ├── capture-profiles-test.js │ ├── payload-budget-test.js │ ├── google-form-test.js │ ├── google-test.js │ └── indeed-test.js │ ├── scripts/ │ ├── run-dom-headless.bat # Fastest: headless Chromium │ ├── run-visual-headful.bat # Visual: Chromium + screenshots │ ├── run-chrome-profile.bat # Auth: real Chrome with your profile │ ├── run-cdp-profile.bat # Stealth: CDP mode │ ├── run-cdp-profile-screen.bat # Stealth + visual │ ├── run-cdp-profile-persist.bat # Stealth + persistent session │ ├── run-cdp-profile-screen-persist.bat # Full power │ ├── autoconnect.js # CDP auto-connect helper │ └── .gemini/settings.json # Fallback MCP config │ ├── profiles/ │ ├── dom/ │ │ ├── system.md # Gemini system instructions (DOM mode) │ │ └── oneshot.md # One-shot variant (closes browser at end) │ ├── visual/ │ │ ├── system.md │ │ └── oneshot.md │ ├── cdp/ │ │ ├── system.md │ │ ├── oneshot.md │ │ └── persistent.md │ └── cdp-visual/ │ ├── system.md │ ├── oneshot.md │ └── persistent.md │ ├── .gemini/settings.json # Main MCP config (set your cwd here) ├── GEMINI.md # Project-level Gemini instructions ├── LICENSE # ISC License └── README.md ``` ### Running Tests ```bash # All tests that don't need network npm run test:local # Live network tests (Indeed + Google) npm run test:remote # Everything npm run test:all ``` --- ## Troubleshooting ### "Chrome is already running" / Profile locked ```bash # Use --kill-chrome scripts\run-chrome-profile.bat --kill-chrome # Or manually taskkill /F /IM chrome.exe ``` Chrome 136+ blocks automation on the default User Data directory. Always use a dedicated profile or the `ChromeForMCP` data dir. ### "Gmail says browser is not safe" You're connected via Chromium, not your real Chrome. Ensure: 1. Chrome is fully closed before starting (`--kill-chrome`) 2. The launch response shows `"persistent": true` and your profile path 3. If not, restart Gemini and verify `.bat` outputs `Using Chrome executable: ...` ### MCP tools not found in Gemini - Run any `.bat` from any directory — they auto-fix `cwd` - Verify `.gemini/settings.json` has the correct `cwd` - The `scripts/.gemini/settings.json` is a fallback if Gemini starts in `scripts/` ### Responses truncated / `retryWith` hint This is the payload budget working correctly. Gemini will read the `retryWith` hint and retry with lower parameters. If it keeps happening, switch to `light` profile: ``` browser.set_capture_profile({ profile: "light" }) ``` ### Slow performance - Use `run-dom-headless.bat` for bulk operations (no GUI = 3-4x faster) - Avoid `browser.extract_html` — it returns full HTML and wastes tokens - Use `detail: "low"` on all tools unless you specifically need more ### Browser opens but ignores my profile Check `.bat` output for: ``` Using Chrome executable: C:\Program Files\Google\Chrome\Application\chrome.exe Using Chrome profile: Profile 3 ``` If you see a different profile or "not found", edit the `.bat` and set `MCP_PROFILE` explicitly. --- ## Security & Privacy ### Path Restrictions `browser.evaluate` (arbitrary JS execution) is **disabled by default**. Enable it only explicitly: `MCP_ALLOW_EVALUATE=true` `files.read_text` and `files.write_text` are restricted to: - **Read**: `Applied Jobs/`, `Auto/output/`, `Auto/logs/` - **Write**: `Auto/output/`, `Auto/logs/` Any attempt to read or write outside these paths throws immediately. Symlinks are resolved before checking (prevents traversal attacks). ### What Is Stored | Data | Location | Git-ignored | |------|----------|-------------| | Execution logs | `logs/` | ✅ Yes | | Extracted jobs/data | `output/` | ✅ Yes | | Session state exports | `output/` | ✅ Yes | | Gemini CLI state | `scripts/.gemini/state.json` | ✅ Yes | | `.gemini/` config | root `.gemini/` | ✅ Yes | ### What Is Never Stored - ❌ Passwords or credentials - ❌ Credit card or payment information - ❌ Browser history - ❌ Personal documents outside the allowed paths --- ## Ethical Use This tool is provided for: - Learning browser automation and MCP development - Testing your own web applications - Automating tasks on sites you have permission to access - Legitimate job searching and application workflows **You are responsible for:** - Respecting `robots.txt` and website Terms of Service - Complying with data protection regulations (GDPR, CCPA, etc.) - Rate-limiting your requests to avoid service disruption - Not using this to bypass paywalls or access controls without authorization The authors assume no liability for misuse. Use responsibly. --- ## How This Differs from Microsoft's Official `playwright-mcp` Microsoft's [playwright-mcp](https://github.com/microsoft/playwright-mcp) focuses on **accessibility-tree based automation** for test development in structured environments. | Feature | Microsoft `playwright-mcp` | This project | |---------|---------------------------|-------------| | Navigation | Accessibility tree | Hybrid: DOM + A11y + Visual | | Philosophy | "Blind" automation (fast, structured) | Human-like automation (robust, adaptive) | | Primary use case | QA testing, defined workflows | Open-web agents, scraping, complex UIs | | Token efficiency | Not optimized | Capture profiles + hard payload budget | | Session persistence | Basic | Cookie/storage export-import | | Form intelligence | Manual | `form_audit` + `fill_form` + Google Forms specialist | | Multi-tab | Basic | Full page manager with stable pageIds | | Setup | Generic | Batteries included (stealth, profiles, launchers) | **Use Microsoft's for:** CI/CD test automation, structured accessibility-driven workflows **Use this for:** Autonomous agents operating on the open web, job application automation, anti-detection scraping --- ## Changelog ### v2.0.0 (Current) - Complete architectural rewrite: monolithic → 11 modular files - 71 MCP tools (was 23) - Capture profile system (light/balanced/full) for token efficiency - Hard 280KB payload budget with graceful truncation and `retryWith` hints - Multi-tab page manager (list, select, close pages) - A11y tree snapshots via CDP with stable `ax-` UIDs - CDP-native click/hover/scroll by backendDOMNodeId (handles Shadow DOM) - Form audit + intelligent fill + Google Forms specialist (6 tools) - Session export/import (cookie + localStorage persistence) - Popup, dialog, download event handling - Scroll awareness: get state, scroll by delta, scroll containers - Network + console observability via CDP - File reading: text files + PDF extraction - Security: path allowlist enforcement, evaluate guard - 18-test suite (was 2) - 7 profile launchers (was 5): added persist variants for CDP - GEMINI_CLI_MCP_* dual env var support for Gemini sanitization ### v1.1.0 - Profile launcher system (.bat files) - Chrome profile integration - `--kill-chrome` flag - One-shot mode with automatic logging - GEMINI_CLI_MCP_* environment variable aliases - `browser.visual_snapshot` and `browser.click_at` ### v1.0.0 - Initial release - Basic MCP server with Playwright - Indeed + Google extractors - DOM and visual navigation --- ## Contributing 1. Fork the repository 2. Create a feature branch (`git checkout -b feature/your-feature`) 3. Run `npm run test:local` to verify nothing breaks 4. Commit (`git commit -m 'Add your feature'`) 5. Push and open a Pull Request --- ## License ISC License — see [LICENSE](LICENSE) file. --- ## Acknowledgments - [Playwright](https://playwright.dev/) — browser automation backbone - [Model Context Protocol](https://modelcontextprotocol.io/) — AI tool interface - [Microsoft playwright-mcp](https://github.com/microsoft/playwright-mcp) — inspiration for the A11y approach --- ## Support - **Issues**: [GitHub Issues](https://github.com/Mhrnqaruni/mcp-playwright-browser/issues) - **Discussions**: [GitHub Discussions](https://github.com/Mhrnqaruni/mcp-playwright-browser/discussions)

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/Mhrnqaruni/mcp-playwright-browser'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

README.md•35.5 KiB