The Browser Use Server enables browser automation via Python scripts within the Cline ecosystem, allowing programmatic web interactions with these capabilities:
Capture Screenshots: Take screenshots of webpages (full-page or viewport)
Retrieve HTML Content: Fetch the HTML of webpages
Execute JavaScript: Run custom JavaScript code on webpages
Access Console Logs: Retrieve browser console logs
Custom Interaction Steps: Perform sequences of actions (clicking, scrolling, form filling, authentication) before main operations
Headless Automation: Optional Xvfb support for headless operation to avoid bot detection
LLM Integration: Support for multiple LLM providers (OpenAI, Groq, Gemini)
Vision Capabilities: Optional visual understanding for webpage content
Debugging Tools: Includes MCP Inspector and detailed error handling
Supports GitHub as an LLM provider through API key integration
Allows executing JavaScript code on webpages through the 'execute_js' operation
Supports Ollama as an LLM provider through API key integration
Supports OpenAI as an LLM provider through API key integration
Uses Python scripts for browser automation to perform various operations like capturing screenshots, retrieving HTML content, executing JavaScript, and getting console logs from webpages
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@Browser Use Servertake a screenshot of the homepage at https://example.com"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
Browser Use Server
A Model Context Protocol server for browser automation using Python scripts. For use with Cline
Features
Browser Operations
screenshot: Capture a screenshot of a webpage (full page or viewport)get_html: Retrieve the HTML content of a webpageexecute_js: Execute JavaScript on a webpageget_console_logs: Get console logs from a webpage
All operations support custom interaction steps (e.g., clicking elements, scrolling) after page load.
Related MCP server: Browser Automation MCP Server
Prerequisites
(Optional but recommended) Install Xvfb for headless browser automation:
# Ubuntu/Debian
sudo apt-get install xvfb
# CentOS/RHEL
sudo yum install xorg-x11-server-Xvfb
# Arch Linux
sudo pacman -S xorg-server-xvfbXvfb (X Virtual Frame Buffer) creates a virtual display, allowing browser automation without detection as a bot. Learn more about Xvfb here.
Install Miniconda or Anaconda
Create a Conda environment:
conda create -n browser-use python=3.11
conda activate browser-use
pip install -r requirements.txtSet up LLM configuration:
The server supports multiple LLM providers. You can use any of the following API keys:
# Required: Set at least one of these API keys
export GLHF_API_KEY=your_api_key
export GROQ_API_KEY=your_api_key
export OPENAI_API_KEY=your_api_key
export OPENROUTER_API_KEY=your_api_key
export GITHUB_API_KEY=your_api_key
export DEEPSEEK_API_KEY=your_api_key
export GEMINI_API_KEY=your_api_key
export OLLAMA_API_KEY=your_api_key
# Optional: Override default configuration
export MODEL=your_preferred_model # Override the default model
export BASE_URL=your_custom_url # Override the default API endpoint
export USE_VISION=false # Enable/disable vision capabilities (default: false)The server will automatically use the first available API key it finds. You can optionally customize the model and base URL for any provider using the environment variables.
Installation
Installing via Smithery
To install Browser Use Server for Claude Desktop automatically via Smithery:
npx -y @smithery/cli install @ztobs/cline-browser-use-mcp --client claudeClone this repository to
/home/YOUR_HOME/Documents/Cline/directoryInstall dependencies:
npm installBuild the server:
npm run buildMCP Configuration
Add the following configuration to your Cline MCP settings:
"browser-use": {
"command": "node",
"args": [
"/home/YOUR_HOME/Documents/Cline/MCP/browser-use-server/build/index.js"
],
"env": {
// Required: Set at least one API key
"GLHF_API_KEY": "your_api_key",
"GROQ_API_KEY": "your_api_key",
"OPENAI_API_KEY": "your_api_key",
"OPENROUTER_API_KEY": "your_api_key",
"GITHUB_API_KEY": "your_api_key",
"DEEPSEEK_API_KEY": "your_api_key",
"GEMINI_API_KEY": "your_api_key",
"OLLAMA_API_KEY": "your_api_key",
// Optional: Configuration overrides
"MODEL": "your_preferred_model",
"BASE_URL": "your_custom_url",
"USE_VISION": "false"
},
"disabled": false,
"autoApprove": []
}Replace:
YOUR_HOMEwith your actual home directory nameyour_api_keywith your actual API keys
Usage
Run the server:
node build/index.jsThe server will be available on stdio and supports the following operations:
Screenshot
Parameters:
url: The webpage URL (required)
full_page: Whether to capture the full page or just the viewport (optional, default: false)
steps: Comma-separated actions or sentences describing steps to take after page load (optional)
Get HTML
Parameters:
url: The webpage URL (required)
steps: Comma-separated actions or sentences describing steps to take after page load (optional)
Execute JavaScript
Parameters:
url: The webpage URL (required)
script: JavaScript code to execute (required)
steps: Comma-separated actions or sentences describing steps to take after page load (optional)
Get Console Logs
Parameters:
url: The webpage URL (required)
steps: Comma-separated actions or sentences describing steps to take after page load (optional)
Example Cline Usage
Here are some example tasks you can accomplish using the browser-use server with Cline:
Modifying Web Page Elements during Development
To change the color of a heading on a page that requires authentication:
Change the colour of the headline with the text "Alle Foren im Überblick." to deep blue on https://localhost:3000/foren/ page
To check/see the page, use browser-use MCP server to:
Open https://localhost:3000/auth,
Login with ztobs:Password123,
Navigate to https://localhost:3000/foren/,
Accept cookies if required
hint: execute all browser actions in one command with multiple comma-separated stepsThis task demonstrates:
Multi-step browser automation using comma-separated steps
Authentication handling
Cookie acceptance
DOM manipulation
CSS styling changes
The server will execute these steps sequentially, handling any required interactions along the way.
Configuration
LLM Configuration
The server supports multiple LLM providers with their default configurations:
GLHF: Uses deepseek-ai/DeepSeek-V3 model
Ollama: Uses qwen2.5:32b-instruct-q4_K_M model with 32k context window
Groq: Uses deepseek-r1-distill-llama-70b model
OpenAI: Uses gpt-4o-mini model
Openrouter: Uses deepseek/deepseek-chat model
Github: Uses gpt-4o-mini model
DeepSeek: Uses deepseek-chat model
Gemini: Uses gemini-2.0-flash-exp model
You can override these defaults using environment variables:
MODEL: Set a custom model name for any providerBASE_URL: Set a custom API endpoint URL (if the provider supports it)
Vision Support
The server supports vision capabilities through the USE_VISION environment variable:
Set USE_VISION=true to enable vision capabilities for browser operations
Default is false to optimize performance when vision is not needed
Useful for tasks that require visual understanding of webpage content
Xvfb Support
The server automatically detects if Xvfb is installed and:
Uses xvfb-run when available, enabling better browser automation without bot detection
Falls back to direct execution when Xvfb is not installed
Sets RUNNING_UNDER_XVFB environment variable accordingly
Timeout
Default timeout is 5 minutes (300000 ms). Modify the TIMEOUT constant in build/index.js to change this.
Error Handling
The server provides detailed error messages for:
Python script execution failures
Browser operation timeouts
Invalid parameters
Debugging
Use the MCP Inspector for debugging:
npm run inspectorUses
License
MIT
Resources
Looking for Admin?
Admins can modify the Dockerfile, update the server description, and track usage metrics. If you are the server author, to authenticate as an admin.