Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@atlas-browser-mcpGo to news.ycombinator.com and tell me the top 3 headlines"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
π atlas-browser-mcp
Visual web browsing for AI agents via Model Context Protocol (MCP).
β¨ Features
πΈ Visual-First: Navigate the web through screenshots, not DOM parsing
π·οΈ Set-of-Mark: Interactive elements labeled with clickable
[0],[1],[2]... markersπ Humanized: Bezier curve mouse movements, natural typing rhythms
π§© CAPTCHA-Ready: Multi-click support for image selection challenges
π‘οΈ Anti-Detection: Built-in measures to avoid bot detection
π Quick Start
Installation
pip install atlas-browser-mcp
playwright install chromiumUse with Claude Desktop
Add to your Claude Desktop config (claude_desktop_config.json):
{
"mcpServers": {
"browser": {
"command": "atlas-browser-mcp"
}
}
}Then ask Claude:
"Navigate to https://news.ycombinator.com and tell me the top 3 stories"
π οΈ Available Tools
Tool | Description |
| Go to URL, returns labeled screenshot |
| Capture current page with labels |
| Click element by label ID |
| Click multiple elements (for CAPTCHA) |
| Type text, optionally press Enter |
| Scroll page up or down |
π Usage Examples
Basic Navigation
User: Go to google.com
AI: [calls navigate(url="https://google.com")]
AI: I see the Google homepage. The search box is labeled [3].
User: Search for "MCP protocol"
AI: [calls click(label_id=3)]
AI: [calls type(text="MCP protocol", submit=true)]
AI: Here are the search results...CAPTCHA Handling
User: Select all images with traffic lights
AI: [Looking at the CAPTCHA grid]
AI: I can see traffic lights in images [2], [5], and [8].
AI: [calls multi_click(label_ids=[2, 5, 8])]π§ Configuration
Headless Mode
For servers without display:
from atlas_browser_mcp.browser import VisualBrowser
browser = VisualBrowser(
headless=True, # No visible browser window
humanize=False # Faster, less human-like
)Custom Viewport
browser = VisualBrowser()
browser.VIEWPORT = {"width": 1920, "height": 1080}ποΈ How It Works
Navigate: Browser loads the page
Inject SoM: JavaScript labels all interactive elements
Screenshot: Capture the labeled page
AI Sees: The screenshot shows
[0],[1],[2]... on buttons, links, inputsAI Acts: "Click
[5]" β Browser clicks the element at that positionRepeat: New screenshot with updated labels
βββββββββββββββββββββββββββββββββββββββ
β [0] Logo [1] Search [2] Menu β
β β
β [3] Article Title β
β [4] Read More β
β β
β [5] Subscribe [6] Share β
βββββββββββββββββββββββββββββββββββββββπ€ Integration
With Cline (VS Code)
{
"mcpServers": {
"browser": {
"command": "atlas-browser-mcp"
}
}
}Programmatic Use
from atlas_browser_mcp.browser import VisualBrowser
browser = VisualBrowser()
# Navigate
result = browser.execute("navigate", url="https://example.com")
print(f"Page title: {result.data['title']}")
print(f"Found {result.data['element_count']} interactive elements")
# Click element [0]
result = browser.execute("click", label_id=0)
# Type in focused field
result = browser.execute("type", text="Hello world", submit=True)
# Cleanup
browser.execute("close")π Requirements
Python 3.10+
Playwright with Chromium
π Troubleshooting
"Playwright not installed"
pip install playwright
playwright install chromium"Browser closed unexpectedly"
Try running with headless=False to see what's happening:
browser = VisualBrowser(headless=False)Elements not being detected
Some dynamic pages need more wait time. The browser waits 1.5s after navigation, but complex SPAs may need longer.
π License
MIT License - see LICENSE
π Credits
Built for Atlas, an autonomous AI agent.
Inspired by:
anthropic/mcp - Model Context Protocol
AskUI - Visual testing approach
Set-of-Mark prompting - Visual grounding technique