Glama
Browser Use Server

# Browser Use Server

[![smithery badge](https://smithery.ai/badge/@ztobs/cline-browser-use-mcp)](https://smithery.ai/server/@ztobs/cline-browser-use-mcp)

A Model Context Protocol server for browser automation using Python scripts. For use with Cline

<a href="https://glama.ai/mcp/servers/0aqrsbhx3z"><img width="380" height="200" src="https://glama.ai/mcp/servers/0aqrsbhx3z/badge" alt="Browser Use Server MCP server" /></a>

## Features

### Browser Operations
- `screenshot`: Capture a screenshot of a webpage (full page or viewport)
- `get_html`: Retrieve the HTML content of a webpage
- `execute_js`: Execute JavaScript on a webpage
- `get_console_logs`: Get console logs from a webpage

All operations support custom interaction steps (e.g., clicking elements, scrolling) after page load.

## Prerequisites

1. (Optional but recommended) Install Xvfb for headless browser automation:
```bash
# Ubuntu/Debian
sudo apt-get install xvfb

# CentOS/RHEL
sudo yum install xorg-x11-server-Xvfb

# Arch Linux
sudo pacman -S xorg-server-xvfb
```
Xvfb (X Virtual Frame Buffer) creates a virtual display, allowing browser automation without detection as a bot. Learn more about Xvfb [here](https://www.x.org/releases/X11R7.6/doc/man/man1/Xvfb.1.xhtml).

2. Install Miniconda or Anaconda
3. Create a Conda environment:
```bash
conda create -n browser-use python=3.11
conda activate browser-use
pip install browser-use
```

4. Set up LLM configuration:

The server supports multiple LLM providers. You can use any of the following API keys:
```bash
# Required: Set at least one of these API keys
export GLHF_API_KEY=your_api_key
export GROQ_API_KEY=your_api_key
export OPENAI_API_KEY=your_api_key
export OPENROUTER_API_KEY=your_api_key
export GITHUB_API_KEY=your_api_key
export DEEPSEEK_API_KEY=your_api_key
export GEMINI_API_KEY=your_api_key
export OLLAMA_API_KEY=your_api_key

# Optional: Override default configuration
export MODEL=your_preferred_model  # Override the default model
export BASE_URL=your_custom_url    # Override the default API endpoint
export USE_VISION=false  # Enable/disable vision capabilities (default: false)
```

The server will automatically use the first available API key it finds. You can optionally customize the model and base URL for any provider using the environment variables.

## Installation

### Installing via Smithery

To install Browser Use Server for Claude Desktop automatically via [Smithery](https://smithery.ai/server/@ztobs/cline-browser-use-mcp):

```bash
npx -y @smithery/cli install @ztobs/cline-browser-use-mcp --client claude
```

1. Clone this repository
2. Install dependencies:
```bash
npm install
```

3. Build the server:
```bash
npm run build
```

## MCP Configuration

Add the following configuration to your Cline MCP settings:

```json
"browser-use": {
  "command": "node",
  "args": [
    "/home/YOUR_HOME/Documents/Cline/MCP/browser-use-server/build/index.js"
  ],
  "env": {
    // Required: Set at least one API key
    "GLHF_API_KEY": "your_api_key",
    "GROQ_API_KEY": "your_api_key",
    "OPENAI_API_KEY": "your_api_key",
    "OPENROUTER_API_KEY": "your_api_key",
    "GITHUB_API_KEY": "your_api_key",
    "DEEPSEEK_API_KEY": "your_api_key",
    "GEMINI_API_KEY": "your_api_key",
    "OLLAMA_API_KEY": "your_api_key",
    // Optional: Configuration overrides
    "MODEL": "your_preferred_model",
    "BASE_URL": "your_custom_url",
    "USE_VISION": "false"
  },
  "disabled": false,
  "autoApprove": []
}
```

Replace:
- `YOUR_HOME` with your actual home directory name
- `your_api_key` with your actual API keys

## Usage

Run the server:
```bash
node build/index.js
```

The server will be available on stdio and supports the following operations:

### Screenshot
Parameters:
- url: The webpage URL (required)
- full_page: Whether to capture the full page or just the viewport (optional, default: false)
- steps: Comma-separated actions or sentences describing steps to take after page load (optional)

### Get HTML
Parameters:
- url: The webpage URL (required)
- steps: Comma-separated actions or sentences describing steps to take after page load (optional)

### Execute JavaScript
Parameters:
- url: The webpage URL (required)
- script: JavaScript code to execute (required)
- steps: Comma-separated actions or sentences describing steps to take after page load (optional)

### Get Console Logs
Parameters:
- url: The webpage URL (required)
- steps: Comma-separated actions or sentences describing steps to take after page load (optional)

## Example Cline Usage

Here are some example tasks you can accomplish using the browser-use server with Cline:

### Modifying Web Page Elements during Development
To change the color of a heading on a page that requires authentication:
```
Change the colour of the headline with the text "Alle Foren im Überblick." to deep blue on https://localhost:3000/foren/ page

To check/see the page, use browser-use MCP server to:
Open https://localhost:3000/auth,
Login with ztobs:Password123,
Navigate to https://localhost:3000/foren/,
Accept cookies if required

hint: execute all browser actions in one command with multiple comma-separated steps
```

This task demonstrates:
- Multi-step browser automation using comma-separated steps
- Authentication handling
- Cookie acceptance
- DOM manipulation
- CSS styling changes

The server will execute these steps sequentially, handling any required interactions along the way.

## Configuration

### LLM Configuration
The server supports multiple LLM providers with their default configurations:

- GLHF: Uses deepseek-ai/DeepSeek-V3 model
- Ollama: Uses qwen2.5:32b-instruct-q4_K_M model with 32k context window
- Groq: Uses deepseek-r1-distill-llama-70b model
- OpenAI: Uses gpt-4o-mini model
- Openrouter: Uses deepseek/deepseek-chat model
- Github: Uses gpt-4o-mini model
- DeepSeek: Uses deepseek-chat model
- Gemini: Uses gemini-2.0-flash-exp model

You can override these defaults using environment variables:
- `MODEL`: Set a custom model name for any provider
- `BASE_URL`: Set a custom API endpoint URL (if the provider supports it)

### Vision Support
The server supports vision capabilities through the USE_VISION environment variable:
- Set USE_VISION=true to enable vision capabilities for browser operations
- Default is false to optimize performance when vision is not needed
- Useful for tasks that require visual understanding of webpage content

### Xvfb Support
The server automatically detects if Xvfb is installed and:
- Uses xvfb-run when available, enabling better browser automation without bot detection
- Falls back to direct execution when Xvfb is not installed
- Sets RUNNING_UNDER_XVFB environment variable accordingly

### Timeout
Default timeout is 5 minutes (300000 ms). Modify the TIMEOUT constant in `build/index.js` to change this.

## Error Handling
The server provides detailed error messages for:
- Python script execution failures
- Browser operation timeouts
- Invalid parameters

## Debugging
Use the MCP Inspector for debugging:
```bash
npm run inspector
```

## Citation

```
@software{browser_use2024,
  author = {Müller, Magnus and Žunič, Gregor},
  title = {Browser Use: Enable AI to control your browser},
  year = {2024},
  publisher = {GitHub},
  url = {https://github.com/browser-use/browser-use}
}
```

## License
MIT