README.md•23.2 kB
# Session-Based Browser-Use FastMCP Server
[](https://github.com/Euraxluo/browser-mcp/actions/workflows/ci.yml)
[](https://codecov.io/gh/Euraxluo/browser-mcp)
[English](#english) | [中文](#chinese)
## English
A modern Model Context Protocol (MCP) server that provides advanced browser automation capabilities using the FastMCP framework. Features session-based instance management, TTL cleanup, PDF generation, file downloads, cookie management, and comprehensive browser configuration options. **All browser operations are implemented via [browser-use](https://github.com/archipelago-technology/browser-use).**
### 🎯 Key Features
- **Session-Based Management**: Each MCP session gets its own isolated browser instance automatically
- **Advanced Browser Control**: Full browser automation with Playwright backend (via browser-use)
- **PDF Generation**: Convert web pages to PDF with custom formatting options
- **File Operations**: Download/upload files, manage file system, and access all temp files
- **Cookie Management**: Set, get, and manage browser cookies for authentication
- **Screenshot Capture**: Take full-page, viewport, or element screenshots
- **Tab Management**: Create, switch, and close browser tabs
- **Content Extraction**: Extract and search page content
- **Session Persistence**: Automatic cleanup with configurable TTL
- **Multi-Instance Support**: Run multiple isolated browser sessions
- **Configurable Security**: All browser security settings are configurable via API
### 🚀 Quick Start
1. **Install Dependencies**:
Using uv (recommended):
```bash
uv sync --all-extras
```
2. **Install the Browser**:
```bash
uv run playwright install --with-deps chromium
```
3. **Start the Server**:
Using uv (recommended):
```bash
uv run main.py
```
4. **Basic Usage (Direct SessionBrowserManager)**:
```python
# Direct usage without MCP protocol (for testing/development)
from browser_fastmcp_server import SessionBrowserManager, BrowserConfig
import asyncio
async def main():
# Create session manager
manager = SessionBrowserManager(max_instances=5, default_ttl=300)
await manager.start_cleanup_task()
# Create a new browser session
session_id = "test_session_123"
instance = await manager.get_or_create_session_instance(
session_id,
BrowserConfig(headless=True)
)
# Navigate to a website
browser_session = instance.browser_session
await browser_session.navigate("https://example.com")
# Get page elements
state_summary = await browser_session.get_state_summary(cache_clickable_elements_hashes=True)
print(f"Interactive elements: {len(state_summary.selector_map)}")
# Take a screenshot
page = await browser_session.get_current_page()
screenshot_bytes = await page.screenshot(full_page=True)
# Close session when done
await manager.close_session(session_id)
await manager.shutdown()
if __name__ == "__main__":
asyncio.run(main())
```
### 🛠️ Run Tests
Install test dependencies and run all tests:
```bash
uv run python -m pytest test_browser_workflow_test.py test_browser_fastmcp_client.py test_browser_test.py -v
```
### 🛠️ Core Tools (API)
#### Session Management
- `create_chrome_instance(headless, viewport_width, viewport_height)` → Create a new browser session, returns `session_id`
- `close_instance(session_id)` → Close a specific session
- `get_instance_info(session_id)` → Get info for a session
- `check_browser_health(session_id)` → Check the health status of a browser session and provide recovery suggestions
- `get_browser_status()` → List all sessions
- `close_all_instances()` → Close all sessions
#### Browser Configuration
- `set_browser_config(session_id, headless, no_sandbox, user_agent, viewport_width, viewport_height, disable_web_security)` → Set browser config (restart if needed)
- `get_browser_config(session_id)` → Get current config
#### Navigation & Page Control
- `navigate_to(session_id, url, new_tab=False)` → Go to any URL (optionally in new tab)
- `navigate_back(session_id)` / `navigate_forward(session_id)` → History navigation
- `refresh_page(session_id)` → Refresh the current page
- `get_page_state(session_id)` → List interactive elements with indices
#### Tab Management
- `get_tabs_info(session_id)` → List all open tabs
- `switch_tab(session_id, page_id)` → Switch between tabs
- `close_tab(session_id, page_id)` → Close specific tab
#### Element Interaction
- `click_element(session_id, index)` → Click element by index
- `click_element_by_xpath(session_id, xpath)` → Click element by XPath
- `input_text(session_id, index, text)` → Type into form fields
- `set_element_value(session_id, index, value)` → Set input/select value directly
- `get_element_info(session_id, index=None, xpath=None)` → Get element info (by index or xpath)
- `send_keys(session_id, keys)` → Send keyboard shortcuts
- `upload_file(session_id, index, file_path)` → Upload files to forms
- `get_dropdown_options(session_id, index)` → Inspect select elements
#### Media & Files
- `take_screenshot(session_id, target=None, width=None, height=None, full_page=True, quality=90, format="png")` → Capture screenshots
- `generate_pdf(session_id, url=None, html_content=None, output_filename=None, ...)` → Save page as PDF
- `download_file(session_id, url, output_filename=None, timeout=30)` → Download files from URLs
- `download_image(session_id, image_url, output_filename=None, timeout=30)` → Download images specifically
#### Cookie & Session Management
- `set_cookie(session_id, name, value, domain, path, http_only, secure, same_site, expires, max_age)` → Set browser cookies
- `get_cookies(session_id, domain=None)` → Retrieve current cookies
#### Utilities
- `scroll_page(session_id, direction="down")` → Scroll up/down
- `extract_content(session_id, query)` → Extract text content
- `wait(seconds)` → Pause execution
- `browser_tips()` → Get automation best practices
- `search_bing(session_id, query)` → Bing search
### 📚 Resources (REST-style)
- `browser://status` → Manager and sessions status
- `browser://instances` → All sessions info
- `browser://instance/{id}/page` → Session page info
- `browser://instance/{id}/tabs` → Session tabs
- `browser://instance/{id}/screenshots` → Session screenshots
- `browser://instance/{id}/status` → Session status (detailed)
- `browser://instance/{id}/files` → Session temp files
- `browser://instance/{id}/cookies` → Session cookies
- `browser://instance/{id}/file/{relative_path}` → Read a file in session temp
- `browser://help` → This help
### 🔧 Configuration
Configure the server using environment variables:
```bash
# Maximum number of concurrent browser instances
BROWSER_MAXIMUM_INSTANCES=10
# Session TTL in seconds (default: 30 minutes)
BROWSER_INSTANCE_TTL=1800
# Command execution timeout in seconds
BROWSER_EXECUTE_TIMEOUT=30
# Cleanup interval in seconds
BROWSER_CLEANUP_INTERVAL=60
```
### 📝 Prompts
Built-in prompts for common automation scenarios:
- `web_testing(url, test_scenario)` → Web testing workflows
- `data_extraction(url, data_type)` → Data extraction strategies
- `form_filling(url, form_data)` → Automated form filling (returns conversation)
- `automation_troubleshooting()` → Debugging help
### 🔌 MCP Integration
#### Using with Claude Desktop
1. **Add to Claude Desktop Configuration**:
Edit your Claude Desktop configuration file (usually at `~/Library/Application Support/Claude/claude_desktop_config.json` on macOS):
```json
{
"mcpServers": {
"browser-mcp": {
"command": "uv",
"args": ["run", "fastmcp", "run", "/path/to/browser-mcp/browser_fastmcp_server.py"],
"env": {
"BROWSER_MAXIMUM_INSTANCES": "5",
"BROWSER_INSTANCE_TTL": "1800"
}
}
}
}
```
2. **Restart Claude Desktop** to load the MCP server
3. **Start Using**: The browser automation tools will now be available in your Claude conversations
#### Using with MCP Client (Two Ways)
**Method 1: Network-based MCP Client (via HTTP/SSE)**
```python
import asyncio
from mcp import ClientSession, SSEClientTransport
async def main():
# Connect to the running server via network
transport = SSEClientTransport("http://localhost:8000/sse")
async with ClientSession(transport) as session:
# Initialize session
await session.initialize()
# Start browser
info = await session.call_tool("create_chrome_instance", {"headless": True})
session_id = info["session_id"]
# Navigate to website
await session.call_tool("navigate_to", {"session_id": session_id, "url": "https://example.com"})
# Take screenshot
await session.call_tool("take_screenshot", {"session_id": session_id})
# Close session
await session.call_tool("close_instance", {"session_id": session_id})
if __name__ == "__main__":
asyncio.run(main())
```
**Method 2: Direct Client (No Network)**
```python
import asyncio
from fastmcp import Client
from browser_fastmcp_server import mcp as browsers_mcp
async def main():
# Direct client connection (no network)
client = Client(browsers_mcp)
async with client:
# Start browser
session = await client.call_tool("create_chrome_instance", {"headless": True})
session_id = session.data.session_id
# Navigate to website
await client.call_tool("navigate_to", {"session_id": session_id, "url": "https://example.com"})
# Take screenshot
await client.call_tool("take_screenshot", {"session_id": session_id})
# Close session
await client.call_tool("close_instance", {"session_id": session_id})
if __name__ == "__main__":
asyncio.run(main())
```
### 🔒 Authentication
For server deployments requiring authentication, modify `main.py` to set an AuthProvider before startup:
**Basic Authentication:**
```python
from fastmcp.auth import BasicAuth
# Add this before mcp.run()
mcp.auth = BasicAuth(username="admin", password="password")
```
**JWT Authentication (Recommended for Production):**
For more advanced authentication, we recommend using [fastmcp-authentication](https://github.com/Euraxluo/fastmcp-authentication):
```python
from fastmcp_authentication import BearerAuthProvider
JWKS_URI = "http://localhost:8080/.well-known/jwks.json"
auth = BearerAuthProvider(
jwks_uri=JWKS_URI,
issuer="http://localhost:8080",
audience="localhost:8080",
algorithm="RS256"
)
mcp.auth = auth
```
### 💡 Use Cases
- **Web Testing**: Automated functional, security, and performance testing
- **Data Scraping**: Extract structured data from websites
- **Form Automation**: Fill and submit web forms programmatically
- **Content Monitoring**: Track changes in web content
- **Screenshot Documentation**: Capture visual evidence for reports
- **PDF Generation**: Convert web pages to PDF documents
- **Session Management**: Handle authenticated workflows
### 🔒 Security Features
- Session isolation between MCP clients
- Secure cookie management with HttpOnly and Secure flags
- Configurable browser security settings (CORS, sandbox, etc.)
- Automatic cleanup of temporary files
- TTL-based session expiration
### 🐳 Docker Usage
Build the image:
```bash
docker build -t browser-mcp .
```
Run the server (default: port 8000, SSE transport):
```bash
docker run -p 8000:8000 browser-mcp
```
You can override startup parameters via environment variables:
```bash
docker run -e MCP_PORT=9000 -e MCP_TRANSPORT=http -e MCP_HOST=127.0.0.1 -p 9000:9000 browser-mcp
```
---
## Chinese
基于会话的浏览器自动化 FastMCP 服务器,提供先进的浏览器自动化功能,使用 FastMCP 框架构建。**所有浏览器操作均通过 [browser-use](https://github.com/archipelago-technology/browser-use) 实现。**
### 🎯 核心特性
- **基于会话的管理**: 每个 MCP 会话自动获得独立的浏览器实例
- **高级浏览器控制**: 基于 Playwright 的完整浏览器自动化(由 browser-use 提供)
- **PDF 生成**: 将网页转换为 PDF,支持自定义格式选项
- **文件操作**: 下载/上传文件,管理临时文件目录
- **Cookie 管理**: 设置、获取和管理浏览器 Cookie 用于身份验证
- **截图捕获**: 全页面、视口或元素截图
- **标签页管理**: 创建、切换和关闭浏览器标签页
- **内容提取**: 提取和搜索页面内容
- **会话持久化**: 自动清理,可配置 TTL
- **多实例支持**: 运行多个隔离的浏览器会话
- **可配置安全性**: 所有浏览器安全设置均可通过 API 配置
### 🚀 快速开始
1. **安装依赖**:
使用 uv(推荐):
```bash
uv sync --all-extras
```
2. **安装浏览器**:
```bash
uv run playwright install --with-deps chromium
```
3. **启动服务器**:
使用 uv(推荐):
```bash
uv run main.py
```
4. **基本使用(直接使用 SessionBrowserManager)**:
```python
# 直接使用,不通过 MCP 协议(用于测试/开发)
from browser_fastmcp_server import SessionBrowserManager, BrowserConfig
import asyncio
async def main():
# 创建会话管理器
manager = SessionBrowserManager(max_instances=5, default_ttl=300)
await manager.start_cleanup_task()
# 创建新浏览器会话
session_id = "test_session_123"
instance = await manager.get_or_create_session_instance(
session_id,
BrowserConfig(headless=True)
)
# 导航到网站
browser_session = instance.browser_session
await browser_session.navigate("https://example.com")
# 获取页面元素
state_summary = await browser_session.get_state_summary(cache_clickable_elements_hashes=True)
print(f"交互元素: {len(state_summary.selector_map)}")
# 截图
page = await browser_session.get_current_page()
screenshot_bytes = await page.screenshot(full_page=True)
# 完成后关闭会话
await manager.close_session(session_id)
await manager.shutdown()
if __name__ == "__main__":
asyncio.run(main())
```
### 🛠️ 运行测试
安装测试依赖并运行所有测试:
```bash
uv run python -m pytest test_browser_workflow_test.py test_browser_fastmcp_client.py test_browser_test.py -v
```
### 🛠️ 核心工具(API)
#### 会话管理
- `create_chrome_instance(headless, viewport_width, viewport_height)` → 创建新浏览器会话,返回 `session_id`
- `close_instance(session_id)` → 关闭指定会话
- `get_instance_info(session_id)` → 获取会话信息
- `check_browser_health(session_id)` → 检查浏览器会话的健康状态并提供恢复建议
- `get_browser_status()` → 列出所有会话
- `close_all_instances()` → 关闭所有会话
#### 浏览器配置
- `set_browser_config(session_id, headless, no_sandbox, user_agent, viewport_width, viewport_height, disable_web_security)` → 设置浏览器配置(如需重启自动重启)
- `get_browser_config(session_id)` → 获取当前配置
#### 导航和页面控制
- `navigate_to(session_id, url, new_tab=False)` → 导航到 URL(可选新标签页)
- `navigate_back(session_id)` / `navigate_forward(session_id)` → 历史记录导航
- `refresh_page(session_id)` → 刷新当前页面
- `get_page_state(session_id)` → 获取带索引的交互元素
#### 标签页管理
- `get_tabs_info(session_id)` → 列出所有打开的标签页
- `switch_tab(session_id, page_id)` → 切换标签页
- `close_tab(session_id, page_id)` → 关闭指定标签页
#### 元素交互
- `click_element(session_id, index)` → 按索引点击元素
- `click_element_by_xpath(session_id, xpath)` → 按 XPath 点击元素
- `input_text(session_id, index, text)` → 在表单字段中输入文本
- `set_element_value(session_id, index, value)` → 直接设置输入/选择值
- `get_element_info(session_id, index=None, xpath=None)` → 获取元素信息(按索引或 xpath)
- `send_keys(session_id, keys)` → 发送键盘快捷键
- `upload_file(session_id, index, file_path)` → 上传文件到表单
- `get_dropdown_options(session_id, index)` → 检查 select 元素
#### 媒体和文件
- `take_screenshot(session_id, target=None, width=None, height=None, full_page=True, quality=90, format="png")` → 截图
- `generate_pdf(session_id, url=None, html_content=None, output_filename=None, ...)` → 保存页面为 PDF
- `download_file(session_id, url, output_filename=None, timeout=30)` → 下载文件
- `download_image(session_id, image_url, output_filename=None, timeout=30)` → 下载图片
#### Cookie 和会话管理
- `set_cookie(session_id, name, value, domain, path, http_only, secure, same_site, expires, max_age)` → 设置 Cookie
- `get_cookies(session_id, domain=None)` → 获取当前 Cookie
#### 实用工具
- `scroll_page(session_id, direction="down")` → 上下滚动
- `extract_content(session_id, query)` → 提取文本内容
- `wait(seconds)` → 暂停执行
- `browser_tips()` → 获取自动化最佳实践
- `search_bing(session_id, query)` → Bing 搜索
### 📚 资源(REST 风格)
- `browser://status` → 管理器和会话状态
- `browser://instances` → 所有会话信息
- `browser://instance/{id}/page` → 会话页面信息
- `browser://instance/{id}/tabs` → 会话标签页
- `browser://instance/{id}/screenshots` → 会话截图
- `browser://instance/{id}/status` → 会话详细状态
- `browser://instance/{id}/files` → 会话临时文件
- `browser://instance/{id}/cookies` → 会话 Cookie
- `browser://instance/{id}/file/{relative_path}` → 读取会话临时文件
- `browser://help` → 帮助
### 🔧 配置
使用环境变量配置服务器:
```bash
# 最大并发浏览器实例数
BROWSER_MAXIMUM_INSTANCES=10
# 会话 TTL(秒)(默认:30分钟)
BROWSER_INSTANCE_TTL=1800
# 命令执行超时(秒)
BROWSER_EXECUTE_TIMEOUT=30
# 清理间隔(秒)
BROWSER_CLEANUP_INTERVAL=60
```
### 📝 提示
常见自动化场景的内置 prompt:
- `web_testing(url, test_scenario)` → Web 测试工作流
- `data_extraction(url, data_type)` → 数据提取策略
- `form_filling(url, form_data)` → 自动表单填写(返回对话)
- `automation_troubleshooting()` → 调试帮助
### 🔌 MCP 集成
#### 与 Claude Desktop 一起使用
1. **添加到 Claude Desktop 配置**:
编辑 Claude Desktop 配置文件(macOS 上通常位于 `~/Library/Application Support/Claude/claude_desktop_config.json`):
```json
{
"mcpServers": {
"browser-mcp": {
"command": "uv",
"args": ["run", "fastmcp", "run", "/path/to/browser-mcp/browser_fastmcp_server.py"],
"env": {
"BROWSER_MAXIMUM_INSTANCES": "5",
"BROWSER_INSTANCE_TTL": "1800"
}
}
}
}
```
2. **重启 Claude Desktop** 以加载 MCP 服务器
3. **开始使用**: 浏览器自动化工具现在可在您的 Claude 对话中使用
#### 与 MCP 客户端一起使用(两种方式)
**方式一:基于网络的 MCP 客户端(通过 HTTP/SSE)**
```python
import asyncio
from mcp import ClientSession, SSEClientTransport
async def main():
# 通过网络连接到运行的服务器
transport = SSEClientTransport("http://localhost:8000/sse")
async with ClientSession(transport) as session:
# 初始化会话
await session.initialize()
# 启动浏览器
info = await session.call_tool("create_chrome_instance", {"headless": True})
session_id = info["session_id"]
# 导航到网站
await session.call_tool("navigate_to", {"session_id": session_id, "url": "https://example.com"})
# 截图
await session.call_tool("take_screenshot", {"session_id": session_id})
# 关闭会话
await session.call_tool("close_instance", {"session_id": session_id})
if __name__ == "__main__":
asyncio.run(main())
```
**方式二:直接客户端(无网络)**
```python
import asyncio
from fastmcp import Client
from browser_fastmcp_server import mcp as browsers_mcp
async def main():
# 直接客户端连接(无网络)
client = Client(browsers_mcp)
async with client:
# 启动浏览器
session = await client.call_tool("create_chrome_instance", {"headless": True})
session_id = session.data.session_id
# 导航到网站
await client.call_tool("navigate_to", {"session_id": session_id, "url": "https://example.com"})
# 截图
await client.call_tool("take_screenshot", {"session_id": session_id})
# 关闭会话
await client.call_tool("close_instance", {"session_id": session_id})
if __name__ == "__main__":
asyncio.run(main())
```
### 🔒 身份验证
对于需要身份验证的服务器部署,在启动前修改 `main.py` 设置 AuthProvider:
**基本身份验证:**
```python
from fastmcp.auth import BasicAuth
# 在 mcp.run() 之前添加
mcp.auth = BasicAuth(username="admin", password="password")
```
**JWT 身份验证(生产环境推荐):**
对于更高级的身份验证,我们推荐使用 [fastmcp-authentication](https://github.com/Euraxluo/fastmcp-authentication):
```python
from fastmcp_authentication import BearerAuthProvider
JWKS_URI = "http://localhost:8080/.well-known/jwks.json"
auth = BearerAuthProvider(
jwks_uri=JWKS_URI,
issuer="http://localhost:8080",
audience="localhost:8080",
algorithm="RS256"
)
mcp.auth = auth
```
### 💡 使用场景
- **Web 测试**: 自动化功能、安全和性能测试
- **数据抓取**: 从网站提取结构化数据
- **表单自动化**: 程序化填写和提交 Web 表单
- **内容监控**: 跟踪 Web 内容变化
- **截图文档**: 为报告捕获视觉证据
- **PDF 生成**: 将网页转换为 PDF 文档
- **会话管理**: 处理身份验证工作流
### 🔒 安全功能
- MCP 客户端之间的会话隔离
- 支持 HttpOnly 和 Secure 标志的安全 Cookie 管理
- 可配置的浏览器安全设置(CORS、沙箱等)
- 临时文件自动清理
- 基于 TTL 的会话过期
### 🐳 Docker 用法
构建镜像:
```bash
docker build -t browser-mcp .
```
运行服务(默认8000端口,SSE模式):
```bash
docker run -p 8000:8000 browser-mcp
```
可通过环境变量覆盖启动参数:
```bash
docker run -e MCP_PORT=9000 -e MCP_TRANSPORT=http -e MCP_HOST=127.0.0.1 -p 9000:9000 browser-mcp
```
---