The Browser-Use MCP Server enables AI-driven browser automation and web research via natural language commands using the Model Context Protocol (MCP).
Browser Automation: Execute tasks like navigation, form filling, and element interaction through natural language (
run_browser_agenttool).Deep Web Research: Perform multi-step research and generate detailed reports (
run_deep_researchtool).Visual Understanding: Analyze screenshots for vision-capable LLMs.
Multi-LLM Support: Integrates with OpenAI, Anthropic, Google, Mistral, Ollama, and other providers.
State Persistence: Manage browser sessions across multiple MCP calls or connect to user's browser via CDP.
CLI Interface: Access core functionalities directly for testing and scripting.
Artifact Management: Save agent history, browser traces, research reports, and downloaded files.
Environment Configuration: Fully configurable via environment variables using a structured Pydantic model.
mcp-server-browser-use
MCP server that gives AI assistants the power to control a web browser.
Table of Contents
Related MCP server: Browser Automation MCP Server
What is this?
This wraps browser-use as an MCP server, letting Claude (or any MCP client) automate a real browser—navigate pages, fill forms, click buttons, extract data, and more.
Why HTTP instead of stdio?
Browser automation tasks take 30-120+ seconds. The standard MCP stdio transport has timeout issues with long-running operations—connections drop mid-task. HTTP transport solves this by running as a persistent daemon that handles requests reliably regardless of duration.
Installation
Claude Code Plugin (Recommended)
Install as a Claude Code plugin for automatic setup:
The plugin automatically:
Installs Playwright browsers on first run
Starts the HTTP daemon when Claude Code starts
Registers the MCP server with Claude
Set your API key (the browser agent needs an LLM to decide actions):
That's it! Claude can now use browser automation tools.
Manual Installation
For other MCP clients or standalone use:
Add to Claude Desktop (~/Library/Application Support/Claude/claude_desktop_config.json):
For MCP clients that don't support HTTP transport, use mcp-remote as a proxy:
Web UI
Access the task viewer at http://localhost:8383 when the daemon is running.
Features:
Real-time task list with status and progress
Task details with execution logs
Server health status and uptime
Running tasks monitoring
The web UI provides visibility into browser automation tasks without requiring CLI commands.
Configuration
Settings are stored in ~/.config/mcp-server-browser-use/config.json.
View current config:
Change settings:
Settings Reference
Key | Default | Description |
|
| LLM provider (anthropic, openai, google, azure_openai, groq, deepseek, cerebras, ollama, bedrock, browser_use, openrouter, vercel) |
|
| Model for the browser agent |
| - | API key for the provider (prefer env vars: GEMINI_API_KEY, ANTHROPIC_API_KEY, etc.) |
|
| Run browser without GUI |
| - | Connect to existing Chrome (e.g., ) |
| - | Chrome profile directory for persistent logins/cookies |
|
| Max steps per browser task |
|
| Enable vision capabilities for the agent |
|
| Max searches per research task |
| - | Timeout for individual searches |
|
| Server bind address |
|
| Server port |
| - | Directory to save results |
| - | Auth token for non-localhost connections |
|
| Enable skills system (beta - disabled by default) |
|
| Skills storage location |
|
| Validate skill execution results |
Config Priority
Environment variables use prefix MCP_ + section + _ + key (e.g., MCP_LLM_PROVIDER).
Using Your Own Browser
Option 1: Persistent Profile (Recommended)
Use a dedicated Chrome profile to preserve logins and cookies:
Option 2: Connect to Existing Chrome
Connect to an existing Chrome instance (useful for advanced debugging):
CLI Reference
Server Management
Calling Tools
Configuration
Observability
Skills Management
MCP Tools
These tools are exposed via MCP for AI clients:
Tool | Description | Typical Duration |
| Execute browser automation tasks | 60-120s |
| Multi-search research with synthesis | 2-5 min |
| List learned skills | <1s |
| Get skill definition | <1s |
| Delete a skill | <1s |
| Server status and running tasks | <1s |
| Query task history | <1s |
| Get full task details | <1s |
run_browser_agent
The main tool. Tell it what you want in plain English:
The agent launches a browser, navigates to apple.com, finds the product, and returns the price.
Parameters:
Parameter | Type | Description |
| string | What to do (required) |
| int | Override default max steps |
| string | Use a learned skill |
| JSON | Parameters for the skill |
| bool | Enable learning mode |
| string | Name for the learned skill |
run_deep_research
Multi-step web research with automatic synthesis:
The agent searches multiple sources, extracts key findings, and compiles a markdown report.
Deep Research
Deep research executes a 3-phase workflow:
Reports can be auto-saved by configuring research.save_directory.
Observability
All tool executions are tracked in SQLite for debugging and monitoring.
Task Lifecycle
Task Stages
During execution, tasks progress through granular stages:
Querying Tasks
List recent tasks:
Get task details:
Server health:
Shows uptime, memory usage, and currently running tasks.
MCP Tools for Observability
AI clients can query task status directly:
health_check- Server status + list of running taskstask_list- Recent tasks with optional status filtertask_get- Full details of a specific task
Storage
Database:
~/.config/mcp-server-browser-use/tasks.dbRetention: Completed tasks auto-deleted after 7 days
Format: SQLite with WAL mode for concurrency
Skills System (Super Alpha)
Warning: This feature is experimental and under active development. Expect rough edges.
Skills are disabled by default. Enable them first:
Skills let you "teach" the agent a task once, then replay it 50x faster by reusing discovered API endpoints instead of full browser automation.
The Problem
Browser automation is slow (60-120 seconds per task). But most websites have APIs behind their UI. If we can discover those APIs, we can call them directly.
The Solution
Skills capture the API calls made during a browser session and replay them directly via CDP (Chrome DevTools Protocol).
Learning a Skill
What happens:
Recording: CDP captures all network traffic during execution
Analysis: LLM identifies the "money request"—the API call that returns the data
Extraction: URL patterns, headers, and response parsing rules are saved
Storage: Skill saved as YAML to
~/.config/browser-skills/npm-search.yaml
Using a Skill
Two Execution Modes
Every skill supports two execution paths:
1. Direct Execution (Fast Path) ~2 seconds
If the skill captured an API endpoint (SkillRequest):
2. Hint-Based Execution (Fallback) ~60-120 seconds
If direct execution fails or no API was found:
Skill File Format
Skills are stored as YAML in ~/.config/browser-skills/:
Parameters
Skills support parameterized URLs and request bodies:
Parameters are substituted at execution time from skill_params.
Auth Recovery
If an API returns 401/403, skills can trigger auth recovery:
The system will navigate to the recovery page (letting you log in) and retry.
Limitations
API Discovery: Only works if the site has an API. Sites that render everything server-side won't yield useful skills.
Auth State: Skills rely on browser cookies. If you're logged out, they may fail.
API Changes: If a site changes their API, the skill breaks. Falls back to hint-based execution.
Complex Flows: Multi-step workflows (login → navigate → search) may not capture cleanly.
Architecture
High-Level Overview
Module Structure
File Locations
What | Where |
Config |
|
Tasks DB |
|
Skills |
|
Server Log |
|
Server PID |
|
Supported LLM Providers
OpenAI
Anthropic
Google Gemini
Azure OpenAI
Groq
DeepSeek
Cerebras
Ollama (local)
AWS Bedrock
OpenRouter
Vercel AI
License
MIT