Glama
MCP Browser Use Server

# MCP server w/ Browser Use

[![smithery badge](https://smithery.ai/badge/@JovaniPink/mcp-browser-use)](https://smithery.ai/server/@JovaniPink/mcp-browser-use)

> MCP server for [browser-use](https://github.com/browser-use/browser-use).

## Overview

This repository contains the server for the [browser-use](https://github.com/browser-use/browser-use) library, which provides a powerful browser automation system that enables AI agents to interact with web browsers through natural language. The server is built on Anthropic's [Model Context Protocol (MCP)](https://modelcontextprotocol.io/introduction) and provides a seamless integration with the [browser-use](https://github.com/browser-use/browser-use) library.

## Features

1. **Browser Control**

- Automated browser interactions via natural language
- Navigation, form filling, clicking, and scrolling capabilities
- Tab management and screenshot functionality
- Cookie and state management

2. **Agent System**

- Custom agent implementation in custom_agent.py
- Vision-based element detection
- Structured JSON responses for actions
- Message history management and summarization

3. **Configuration**

- Environment-based configuration for API keys and settings
- Chrome browser settings (debugging port, persistence)
- Model provider selection and parameters

## Dependencies

This project relies on the following Python packages:

| Package                                    | Version    | Description                                                                                                                                                                                                   |
| :----------------------------------------- | :--------- | :------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| [pydantic](https://docs.pydantic.dev/)       | >=2.10.5  | Data validation and settings management using Python type annotations. Provides runtime enforcement of types and automatic model creation. Essential for defining structured data models in the agent.        |
| [fastapi](https://fastapi.tiangolo.com/)    | >=0.115.6 | Modern, fast (high-performance) web framework for building APIs with Python 3.7+ based on standard Python type hints. Used to create the server that exposes the agent's functionality.                          |
| [uvicorn](https://www.uvicorn.org/)        | >=0.22.0  | ASGI web server implementation for Python. Used to serve the FastAPI application.                                                                                                                           |
| [fastmcp](https://pypi.org/project/fastmcp/)    | >=0.4.1   | A framework that wraps FastAPI for building MCP (Model Context Protocol) servers.    |
| [python-dotenv](https://pypi.org/project/python-dotenv/) | >=1.0.1   | Reads key-value pairs from a `.env` file and sets them as environment variables. Simplifies local development and configuration management.                                                                 |
| [langchain](https://www.langchain.com/)     | >=0.3.14  | Framework for developing applications with large language models (LLMs). Provides tools for chaining together different language model components and interacting with various APIs and data sources.          |
| [langchain-openai](https://api.python.langchain.com/en/latest/langchain_openai.html) | >=0.2.14 | LangChain integrations with OpenAI's models. Enables using OpenAI models (like GPT-4) within the LangChain framework. Used in this project for interacting with OpenAI's language and vision models. |
| [langchain-ollama](https://api.python.langchain.com/en/latest/langchain_ollama/chat_models/ChatOllama.html) | >=0.2.2   | Langchain integration for Ollama, enabling local execution of LLMs. |
| [openai](https://platform.openai.com/docs/api-reference)    | >=1.59.5  | Official Python client library for the OpenAI API. Used to interact directly with OpenAI's models (if needed, in addition to LangChain).                                                                    |
| [browser-use](https://github.com/browser-use/browser-use) | ==0.1.19  | A powerful browser automation system that enables AI agents to interact with web browsers through natural language. The core library that powers this project's browser automation capabilities.      |
| [instructor](https://github.com/jxnl/instructor)   | >=1.7.2   | Library for structured output prompting and validation with OpenAI models. Enables extracting structured data from model responses.                                                                       |
| [pyperclip](https://pyperclip.readthedocs.io/)   | >=1.9.0   | Cross-platform Python module for copy and paste clipboard functions.                                                                                                                                  |

## Components

### Resources

The server implements a browser automation system with:

- Integration with browser-use library for advanced browser control
- Custom browser automation capabilities
- Agent-based interaction system with vision capabilities
- Persistent state management
- Customizable model settings

### Requirements

- Operating Systems (Linux, macOS, Windows; we haven't tested for Docker or Microsoft WSL)
- Python 3.11 or higher
- uv (fast Python package installer)
- Chrome/Chromium browser
- [Claude Desktop](https://claude.ai/download)

### Quick Start

#### Claude Desktop

On MacOS: `~/Library/Application\ Support/Claude/claude_desktop_config.json`
On Windows: `%APPDATA%/Claude/claude_desktop_config.json`

#### Installing via Smithery

To install Browser Use for Claude Desktop automatically via [Smithery](https://smithery.ai/server/@JovaniPink/mcp-browser-use):

```bash
npx -y @smithery/cli install @JovaniPink/mcp-browser-use --client claude
```

<details>
  <summary>Development Configuration</summary>

```json
"mcpServers": {
  "mcp_server_browser_use": {
    "command": "uvx",
    "args": [
      "mcp-server-browser-use",
    ],
    "env": {
      "OPENAI_ENDPOINT": "https://api.openai.com/v1",
      "OPENAI_API_KEY": "",
      "ANTHROPIC_API_KEY": "",
      "GOOGLE_API_KEY": "",
      "AZURE_OPENAI_ENDPOINT": "",
      "AZURE_OPENAI_API_KEY": "",
      // "DEEPSEEK_ENDPOINT": "https://api.deepseek.com",
      // "DEEPSEEK_API_KEY": "",
      // Set to false to disable anonymized telemetry
      "ANONYMIZED_TELEMETRY": "false",
      // Chrome settings
      "CHROME_PATH": "",
      "CHROME_USER_DATA": "",
      "CHROME_DEBUGGING_PORT": "9222",
      "CHROME_DEBUGGING_HOST": "localhost",
      // Set to true to keep browser open between AI tasks
      "CHROME_PERSISTENT_SESSION": "false",
      // Model settings
      "MCP_MODEL_PROVIDER": "anthropic",
      "MCP_MODEL_NAME": "claude-3-5-sonnet-20241022",
      "MCP_TEMPERATURE": "0.3",
      "MCP_MAX_STEPS": "30",
      "MCP_USE_VISION": "true",
      "MCP_MAX_ACTIONS_PER_STEP": "5",
      "MCP_TOOL_CALL_IN_CONTENT": "true"
    }
  }
}
```

</details>

### Environment Variables

Key environment variables:

```bash
# API Keys
ANTHROPIC_API_KEY=anthropic_key

# Chrome Configuration
# Optional: Path to Chrome executable
CHROME_PATH=/path/to/chrome
# Optional: Chrome user data directory
CHROME_USER_DATA=/path/to/user/data
# Default: 9222
CHROME_DEBUGGING_PORT=9222
# Default: localhost
CHROME_DEBUGGING_HOST=localhost
# Keep browser open between tasks
CHROME_PERSISTENT_SESSION=false

# Model Settings
# Options: anthropic, openai, azure, deepseek
MCP_MODEL_PROVIDER=anthropic
# Model name
MCP_MODEL_NAME=claude-3-5-sonnet-20241022
MCP_TEMPERATURE=0.3
MCP_MAX_STEPS=30
MCP_USE_VISION=true
MCP_MAX_ACTIONS_PER_STEP=5
```

## Development

### Setup

1. Clone the repository:

```bash
git clone https://github.com/JovaniPink/mcp-browser-use.git
cd mcp-browser-use
```

2. Create and activate virtual environment:

```bash
python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate
```

3. Install dependencies:

```bash
uv sync
```

4. Start the server

```bash
uv run mcp-browser-use
```

### Debugging

For debugging, use the [MCP Inspector](https://github.com/modelcontextprotocol/inspector):

```bash
npx @modelcontextprotocol/inspector uv --directory /path/to/project run mcp-server-browser-use
```

The Inspector will display a URL for the debugging interface.

## Browser Actions

The server supports various browser actions through natural language:

- Navigation: Go to URLs, back/forward, refresh
- Interaction: Click, type, scroll, hover
- Forms: Fill forms, submit, select options
- State: Get page content, take screenshots
- Tabs: Create, close, switch between tabs
- Vision: Find elements by visual appearance
- Cookies & Storage: Manage browser state

## Security

I want to note that their are some Chrome settings that are set to allow for the browser to be controlled by the server. This is a security risk and should be used with caution. The server is not intended to be used in a production environment.

Security Details: [SECURITY.MD](./documentation/SECURITY.md)

## Contributing

We welcome contributions to this project. Please follow these steps:

1. Fork this repository.
2. Create your feature branch: `git checkout -b my-new-feature`.
3. Commit your changes: `git commit -m 'Add some feature'`.
4. Push to the branch: `git push origin my-new-feature`.
5. Submit a pull request.

For major changes, open an issue first to discuss what you would like to change. Please update tests as appropriate to reflect any changes made.