The Better Playwright MCP server enables efficient, AI-friendly browser automation through a robust client-server architecture with semantic HTML snapshots that reduce token usage by up to 90%.
Page Management: Create, activate, close, and list multiple managed and unmanaged browser pages with customizable names and descriptions, enabling multi-tab automation scenarios.
Web Interaction: Perform diverse browser actions including clicking, typing, hovering, selecting options, pressing keys, uploading files, and handling dialogs (alerts, prompts, confirms) using precise xp
identifiers.
Navigation & Control: Navigate to URLs, move through browsing history, scroll pages or elements, and wait for specific elements to appear or for specified durations.
Content Extraction & Capture: Generate highly compressed semantic HTML snapshots with unique xp
identifiers, capture full-page or element-specific screenshots (PNG/JPEG), generate PDF snapshots, retrieve element HTML for debugging, and download images to temporary directories.
Advanced Features: Capture complete webpage snapshots with automatic scrolling and content trimming, save processed HTML to temporary files, and utilize stealth features with persistent browser profiles for robust, long-running automation tasks.
Enables web automation for Amazon's shopping platform, allowing navigation, search, and interaction with product pages through semantic HTML snapshots.
Identifies and preserves semantic HTML5 tags when generating page snapshots, maintaining the structural meaning of web content while reducing token usage.
Handles JavaScript-based web interactions through Playwright, enabling automation of dynamic web applications built with JavaScript.
Supports Linux as a platform for running the MCP server, with specific file paths for operation records.
Supports macOS as a platform for running the MCP server, with specific file paths for operation records.
Built on Node.js with requirements for Node.js >= 18.0.0 for running the server components.
Built with TypeScript for type safety, with development resources for TypeScript contributors.
Includes WebGL vendor spoofing as part of its stealth features to prevent browser fingerprinting during web automation.
better-playwright-mcp3
A high-performance Playwright MCP (Model Context Protocol) server with intelligent DOM compression and content search capabilities for browser automation.
Features
- 🎭 Full Playwright browser automation via MCP
- 🏗️ Client-server architecture with HTTP API
- 📍 Ref-based element identification system (
[ref=e1]
,[ref=e2]
, etc.) - 🔍 Powerful regex-based content search using ripgrep
- 💾 Persistent browser profiles with Chrome
- 🚀 91%+ DOM compression with intelligent list folding
- 📄 Semantic HTML snapshots using Playwright's internal APIs
- ⚡ High-performance search with safety limits
Installation
Global Installation (for CLI usage)
Local Installation (for SDK usage)
Usage
As a JavaScript/TypeScript SDK
Prerequisites:
- First, start the HTTP server:
- Then use the SDK in your code:
Available Methods:
- Page Management:
createPage
,closePage
,listPages
- Navigation:
browserNavigate
,browserNavigateBack
,browserNavigateForward
- Interaction:
browserClick
,browserType
,browserHover
,browserSelectOption
,fill
- Advanced Actions:
browserPressKey
,browserFileUpload
,browserHandleDialog
- Page Structure:
getOutline
- Get intelligently compressed page structure with list folding (NEW in v3.2.0) - Content Search:
searchSnapshot
- Search page content with regex patterns (powered by ripgrep) - Screenshots:
screenshot
- Capture page as image - Scrolling:
scrollToBottom
,scrollToTop
- Waiting:
waitForTimeout
,waitForSelector
MCP Server Mode
The MCP server requires an HTTP server to be running. You need to start both:
Step 1: Start the HTTP server
Step 2: In another terminal, start the MCP server
The MCP server will:
- Start listening on stdio for MCP protocol messages
- Connect to the HTTP server on port 3102
- Route browser automation commands through the HTTP server
Standalone HTTP Server Mode
You can run the HTTP server independently:
Options:
-p, --port <number>
- Server port (default: 3102)--host <string>
- Server host (default: localhost)--headless
- Run browser in headless mode--chromium
- Use Chromium instead of Chrome--no-user-profile
- Do not use persistent user profile--user-data-dir <path>
- User data directory
MCP Tools
When used with AI assistants, the following tools are available:
Page Management
createPage
- Create a new browser page with name and descriptionclosePage
- Close a specific pagelistPages
- List all managed pages with titles and URLs
Browser Actions
browserClick
- Click an element using its ref identifierbrowserType
- Type text into an elementbrowserHover
- Hover over an elementbrowserSelectOption
- Select options in a dropdownbrowserPressKey
- Press keyboard keysbrowserFileUpload
- Upload files to file inputbrowserHandleDialog
- Handle browser dialogs (alert, confirm, prompt)browserNavigate
- Navigate to a URLbrowserNavigateBack
- Go back to previous pagebrowserNavigateForward
- Go forward to next pagescrollToBottom
- Scroll to bottom of page/elementscrollToTop
- Scroll to top of page/elementwaitForTimeout
- Wait for specified millisecondswaitForSelector
- Wait for element to appear
Content Search & Screenshots
searchSnapshot
- Search page content using regex patterns (powered by ripgrep)screenshot
- Take a screenshot (PNG/JPEG)
Architecture
Intelligent DOM Compression (NEW in v3.2.0)
The outline generation uses a three-step compression algorithm:
- Unwrap - Remove meaningless generic wrapper nodes
- Text Truncation - Limit text content to 50 characters
- List Folding - Detect and compress repetitive patterns using SimHash
Example compression:
System Architecture
This project implements a two-tier architecture optimized for minimal token usage:
- MCP Server - Communicates with AI assistants via Model Context Protocol
- HTTP Server - Controls browser instances and provides grep-based search
Key Design Principles
- Minimal Token Usage: Intelligent compression reduces DOM by ~91%
- On-Demand Search: Content retrieved via regex patterns when needed
- Performance: Uses ripgrep for 10x+ faster searching
- Safety: Automatic result limiting to prevent context overflow
Ref-Based Element System
Elements in snapshots are identified using ref attributes (e.g., [ref=e1]
, [ref=e2]
). This system:
- Provides stable identifiers for elements
- Works with Playwright's internal
aria-ref
selectors - Enables precise element targeting across page changes
Example snapshot:
Examples
Creating and Navigating Pages
Getting Page Structure (Enhanced in v3.2.0)
Searching Content
Search Options:
pattern
(required) - Regex pattern to search forignoreCase
(optional) - Case insensitive search (default: false)lineLimit
(optional) - Maximum lines to return (default: 100, max: 100)
Response Format:
result
- Matched text contentmatchCount
- Total number of matches foundtruncated
- Whether results were truncated due to line limit
Interacting with Elements
Scrolling and Waiting
Best Practices for AI Assistants
Recommended Workflow: Outline First, Then Precise Actions
When using this library with AI assistants, follow this optimized workflow for maximum efficiency:
1. Start with Page Outline (Always First Step)
The outline provides:
- Complete page structure with intelligent list folding
- First element of each pattern preserved as sample
- All ref identifiers for precise element targeting
- Clear indication of repetitive patterns (e.g., "... and 47 more similar")
2. Use Outline to Guide Precise Searches
3. Take Actions with Verified Ref IDs
Why This Approach?
Token Efficiency: Compressed outline (typically <500 lines) + targeted searches use far fewer tokens than full snapshots (often 5000+ lines)
Accuracy: The outline shows actual page structure, preventing incorrect assumptions about element locations
Smart Compression: The algorithm preserves one sample from each pattern group, so AI understands the structure without seeing all repetitions
Anti-Patterns to Avoid
❌ Don't blindly try random ref IDs without verification ❌ Don't request full snapshots that exceed token limits ❌ Don't make assumptions about page structure without checking the outline first ❌ Don't use generic search patterns when specific ones would be more efficient
Example: Searching Amazon Products
Development
Prerequisites
- Node.js >= 18.0.0
- TypeScript
- Chrome or Chromium browser
Building from Source
Project Structure
Troubleshooting
Common Issues
- Port already in use
- Change the port using
-p
flag:npx better-playwright-mcp3 server -p 3103
- Or set environment variable:
PORT=3103 npx better-playwright-mcp3 server
- Change the port using
- Browser not launching
- Ensure Chrome or Chromium is installed
- Try using
--chromium
flag for Chromium - Check system resources
- Element not found
- Verify the ref identifier exists in outline
- Use
searchSnapshot()
to search for elements - Wait for elements using
waitForSelector()
- Search returns too many results
- Use more specific patterns
- Use
lineLimit
option to limit results - Leverage regex features for precise matching
Debug Mode
Enable detailed logging:
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
License
MIT
hybrid server
The server is able to function both locally and remotely, depending on the configuration or use case.
Tools
A client-server browser automation solution that reduces HTML token usage by up to 90% through semantic snapshots, enabling complex web interactions without exhausting AI context windows.
Related MCP Servers
- AsecurityFlicenseAqualityEnables AI agents to interact with web browsers using natural language, featuring automated browsing, form filling, vision-based element detection, and structured JSON responses for systematic browser control.Last updated -51
- -securityAlicense-qualityEnables browser automation and real-time computer vision tasks through AI-driven commands, offering zero-cost digital navigation and interaction for enhanced web experiences.Last updated -01MIT License
- -securityFlicense-qualityProvides browser automation capabilities through an API endpoint that interprets natural language commands to perform web tasks using OpenAI's GPT models.Last updated -
- AsecurityAlicenseAqualityAI-driven browser automation server that implements the Model Context Protocol to enable natural language control of web browsers for tasks like navigation, form filling, and visual interaction.Last updated -12MIT License