unbrowser
The unbrowser server provides a lightweight, agent-friendly web browser interface for LLMs — fetching, parsing, and interacting with web pages without full Chrome overhead, returning structured outputs optimized for agent consumption.
Navigation & Page Reading
Fetch URLs with Chrome-fingerprinted HTTP, parse HTML, and return a structured BlockMap summary (~500 tokens) with auto-extracted embedded JSON and tool recommendations
Retrieve raw HTML, main content text (stripped of nav/header/footer), or clean whitespace-collapsed text from selectors
DOM Querying & Search
Query elements via CSS selectors (with stable refs for interaction), find elements by visible text, diagnose selector misses, and retrieve surrounding context
Structured Data Extraction
Auto-strategy extraction: JSON-LD →
__NEXT_DATA__→ Nuxt → JSON-in-script → OpenGraph → microdata → text fallbackExtract HTML tables into
{headers, rows}, pull repeated card/listing patterns into structured arrays, and parse captured network/API responses into semantic objects
Page Modeling & Discovery
Model a page into semantic task-discoverable JSON (search forms, nav links, article/product cards, tables) with goal-based scoring
Discover navigation/search routes and merge DOM routes, inferred URLs, and network JSON into a ranked discovery graph
Interaction
Click elements (auto-follows
<a href>links), type into inputs, submit forms (GET or POST), and classify interaction effects (navigated, dom_changed, no_effect, etc.)
JavaScript Execution
Run arbitrary JS in an embedded QuickJS runtime and drain the event loop (microtasks, setTimeout/setInterval)
Cookie & Session Management
Set, get, and clear cookies for persistent sessions or replaying clearance cookies to bypass bot detection
Network Capture
Capture and inspect fetch/XHR responses (JSON, GraphQL, NDJSON) from navigations, ranked by content value
Observability & Diagnostics
Challenge detection for bot-walled pages, SPA detection heuristics,
tool_recommendationsto guide agent next steps, andreport_outcomefor binding task success/failure to prior navigations for policy learning
Allows accessing Cloudflare-protected websites by handling bot challenges and cookie handoff, with automatic detection of challenge provider.
Enables scraping and interacting with GitHub repository browsing pages with full selector support.
Provides limited rendering support for Next.js SSR pages and light hydration, allowing navigation and query of static content.
Offers excellent navigation and query capabilities on Wikipedia pages, producing a compact BlockMap summary.
Supports navigating Zillow pages with cookie handoff to bypass bot walls.
unbrowser
Web access for LLM agents. One static binary. No Chrome.
unbrowser is the lightweight open-source browser tier from Unchained: cheap, stateful web access for agents when curl/WebFetch is too dumb and full Chrome is too heavy. When a page needs real Chrome, cookies, extensions, or human-in-the-loop auth, escalate to unchainedsky-cli or Unchained.
Install
Python (recommended) — wheel ships the native binary. Requires Python 3.10+:
pipx install pyunbrowser # cleanest on macOS Homebrew / modern Linux (handles PEP 668)
pip install pyunbrowser # in a venv on python3.10+macOS gotcha: the system
/usr/bin/python3is 3.9 and the wheel will reject it with "requires Python >=3.10". Use Homebrew'spython3.13orpipx(which manages its own Python). Ifpip installfails with PEP 668 ("externally-managed-environment"), that's the same issue —pipx install pyunbrowseris the right call.
from unbrowser import Client # note: pip name is pyunbrowser, import is unbrowser
with Client() as ub: # (PyPI's name moderation blocks 'unbrowser';
r = ub.navigate("https://news.ycombinator.com") # py- prefix is the standard workaround)Cargo — binary only, no Python wrapper:
cargo install unbrowser
unbrowser --mcpMCP — add the binary to Claude Code, Claude Desktop, Cursor, Cline, or any MCP host:
{
"mcpServers": {
"unchained": {
"command": "unbrowser",
"args": ["--mcp"]
}
}
}The unchained key is only the client-side alias. Use unbrowser if you want exact naming, or keep unchained as the breadcrumb to the full Unchained browser-agent stack.
Pre-built tarball — for systems without Python or Rust:
# macOS Apple Silicon
curl -L https://github.com/protostatis/unbrowser/releases/latest/download/unbrowser-aarch64-apple-darwin.tar.gz | tar xz
# macOS Intel
curl -L https://github.com/protostatis/unbrowser/releases/latest/download/unbrowser-x86_64-apple-darwin.tar.gz | tar xz
# Linux x86_64 (glibc 2.31+ / Ubuntu 20.04+)
curl -L https://github.com/protostatis/unbrowser/releases/latest/download/unbrowser-x86_64-unknown-linux-gnu.tar.gz | tar xzFrom source:
cargo build --release # binary at ./target/release/unbrowserSession CLI
For shell-only agents, use a persistent session instead of heredoc JSON-RPC:
unbrowser session start --id demo
unbrowser exec demo navigate https://news.ycombinator.com
unbrowser exec demo query '.titleline > a'
unbrowser exec --pretty demo blockmap
unbrowser session stop demoBare RPC (low-level escape hatch)
echo '{"id":1,"method":"navigate","params":{"url":"https://news.ycombinator.com"}}' | unbrowserThat's the install. Runs anywhere a static binary runs — laptop, Lambda, Cloudflare Workers, edge, embedded.
Open source under Apache 2.0. When the cheap path can't handle a page (heavy SPAs, behavioral bot challenges), escalate to a real browser via unchainedsky-cli (drives your local Chrome via CDP) or the Unchained desktop app.
By the numbers
This binary | Headless Chrome (Playwright/Puppeteer) | |
Binary size | ~10MB | 250MB+ Chrome download |
RAM / session | ~50MB | 200–500MB |
Cold start | ~100ms | ~1s |
Tokens / page (LLM) | ~500 (BlockMap inline) | tens of thousands of HTML, parsed by you |
Install steps |
| install Chrome + Node + Playwright + system deps |
Lambda / Workers / edge | ✅ | ❌ Chrome too big |
100K pages/day cost | $0 (your infra) | $$$ Chrome fleet or hosted API |
5–10× lower memory, 25× smaller binary, 10× faster cold start, 70× lower per-page token cost. That's the tradeoff this product makes — defer JS-rendering (Phase 4/5) and pixel rendering (out of scope) in exchange for a footprint that fits in places Chrome doesn't.
Agent-friendly by design
This isn't a Chrome wrapper that an agent uses through a Puppeteer-shaped abstraction. It's a browser whose every output is shaped for LLM consumption:
navigatereturns a BlockMap — ~500 tokens of structured page summary (landmarks, headings, interactives, density signals) right in the response. No follow-up call needed to know what's on the page.Stable element refs (
e:142) — query, click, type, submit using opaque handles. The LLM never has to scrape the DOM itself.challengefield on every blocked navigate — provider, confidence, and the exact clearance cookie name. The agent reacts intelligently instead of guessing.density.likely_js_filledheuristic — distinguishes "real SSR page" from "SSR shell with JS-filled cells" (the CNBC trap). The agent bails before burning round-trips on a page it can't read.MCP-native —
unbrowser --mcpexposes the RPC tool surface to any MCP host (Claude Code, Claude Desktop, Cursor, Cline). 4 lines of config, zero glue code.Real Chrome fingerprint (Chrome 134 JA4 + Akamai H2 hash) so sites don't block you for being a script.
For pages that do need real Chrome (heavy SPAs, JS-challenge bot walls), the binary detects them and accepts cookies via cookies_set — so you solve once in Chrome and replay forever here.
Quick demo — Hacker News top 3
from unbrowser import Client
with Client() as ub:
ub.navigate("https://news.ycombinator.com")
for s in ub.query(".titleline > a")[:3]:
print(s["text"], s["attrs"]["href"])5 lines, no headless browser install. Output is structured JSON, not 35KB of HTML. The Client wrapper handles subprocess lifecycle (atexit reaper so orphans are impossible), JSON-RPC framing, and surfaces real exceptions instead of silent result lookups.
The same demo without the wrapper — useful for languages other than Python or multi-step sessions. The protocol is JSON-RPC over stdin/stdout, one JSON object per line:
import subprocess, json
p = subprocess.Popen(["./target/release/unbrowser"],
stdin=subprocess.PIPE, stdout=subprocess.PIPE, text=True, bufsize=1)
i = 0
def call(method, **params):
global i; i += 1
p.stdin.write(json.dumps({"id": i, "method": method, "params": params}) + "\n")
p.stdin.flush()
return json.loads(p.stdout.readline())["result"]
call("navigate", url="https://news.ycombinator.com")
for s in call("query", selector=".titleline > a")[:3]:
print(s["text"], s["attrs"]["href"])That's the entire protocol surface. Same shape from any language with subprocess + JSON.
One-shot CLI
For shell-friendly calls, use the convenience subcommand:
unbrowser navigate https://news.ycombinator.com --jsonThat prints one JSON result and exits from any install path (PyPI wheel, Cargo, or release tarball). Use JSON-RPC only when you need a persistent session. Run unbrowser --help for the native CLI surface.
A/B runtime shims
For corpus tests against JS-heavy pages, compare the default stable shims with the opt-in enhanced browser-environment shims:
unbrowser navigate https://example.com --exec-scripts --json
unbrowser navigate https://example.com --exec-scripts --json --shims enhanced
# or for JSON-RPC / MCP sessions:
UNBROWSER_SHIMS=enhanced unbrowserenhanced adds content-positive layout/media/scroll/IndexedDB guesses on top of the stable runtime. It is intentionally opt-in so A/B runs can measure whether more page state materializes without changing the baseline.
Script evaluation is still bounded by UNBROWSER_SCRIPT_EVAL_BUDGET_MS (default 5000); navigate results report scripts.budget_exhausted and scripts.budget_skipped when the budget stops further script execution. The outer RPC watchdog (UNBROWSER_TIMEOUT_MS, default 30000) still wins if it is lower than the script budget.
For a JSONL corpus sweep:
python3 scripts/shim_ab.py --url https://nextjs.org/docs --url https://www.npmjs.com/package/playwrightSPA tier — what works, what doesn't
Empirical, not aspirational. Latest matrix: 28/30 on tested categories.
Page tier | Coverage | What to expect |
Static + SSR (Wikipedia, MDN, news, docs, GitHub repo browsing, search engines, archive.org) | ✅ excellent | sub-second navigate; full BlockMap; all selectors work; ~hundreds of tokens vs ~tens of KB raw |
SSR + light hydration (Next.js docs, marketing pages, react.dev's static content) | ✅ usable | reads SSR'd content fine; hydration adds nothing but doesn't break either |
Bot-walled with cookie handoff (Zillow, Cloudflare-protected sites) | ✅ via | solve once in Chrome, replay forever; |
Module-loader SPAs (Ember, AMD apps like crates.io) | ⚠️ partial with | bundles fetch + execute, modules register, but framework auto-mount needs case-by-case shimming |
Heavy React/Vue bundles (react.dev runtime, large dashboard apps) | ⚠️ bounded — won't hang, won't render | with |
Apps requiring Workers / Canvas / IndexedDB / WebGL | ❌ out of scope by design | use the cookie-handoff path with real Chrome via |
Hardest-tier anti-bot (PerimeterX with behavioral, Kasada, Akamai BMP advanced) | ❌ even cookie handoff is fragile | real Chrome via CDP is the right tier |
Vs the alternatives:
This | curl | Playwright / headless Chrome | |
Static / SSR pages | ✅ | ✅ but token-heavy | overkill |
SPA-shell sites | ⚠️ partial via | ❌ | ✅ |
Bot-walled (with cookie handoff) | ✅ | ❌ | ✅ |
Run in Lambda / Workers / edge | ✅ | ✅ | ❌ Chrome too big |
Per-page cost at 100K/day | ~free | ~free | $$$ |
LLM-shaped output | ✅ BlockMap inline | DIY parse | DIY parse |
Verified against (working)
Concrete sites tested with measured times. Cold-start to extracted-result.
Category | Sites | Time |
Reference / docs | Wikipedia, MDN, docs.rs, PyPI, react.dev (SSR portion) | 0.9 – 5.8s |
News | Hacker News, BBC, TechCrunch, ArXiv listings | 1 – 1.6s |
Search | Google | 0.2 – 1.8s |
Dev | GitHub repo pages, npm, StackOverflow, HuggingFace model cards | 0.7 – 2.4s |
Crypto / finance | CoinGecko, Yahoo Finance (post-redirect-fix) | 3.5 – 6.9s |
Social | Lobsters, old.reddit.com | 0.9 – 1.4s |
Govt / institutional | arXiv, archive.org, gov.uk | 0.6 – 1.0s |
Interaction primitives | type, click + auto-follow, cookies_set/get/replay, eval, query_text | 0.3 – 1.3s |
Surprises: all four major search engines work cleanly. CoinGecko's heavy dashboard SSRs enough that quotes come through. HuggingFace model cards expose model name in <h1>.
Bot-detection diagnostics
Every blocked navigate returns a challenge field naming the vendor (perimeterx_block, cloudflare_turnstile, aws_waf, datadome, akamai_bmp, imperva, arkose_labs, recaptcha, press_hold, yahoo_sad_panda, interstitial, generic_human_verification, unknown_block) plus the expected clearance cookie name. Agents react with cookie handoff via cookies_set instead of guessing.
Local cookie solver service
For fully transparent cookie handoff, run the local-only solver service backed by unchained-cli:
pip install 'pyunbrowser[solver]' # or: pip install unchainedsky-cli
python scripts/cookie_service.py --headless --profile unbrowser-cookie-service
export UNBROWSER_COOKIE_SERVICE_URL=http://127.0.0.1:8765Then use scripts/router.py (or RouterConfig(cookie_service_url=...)) as the agent-facing entry point. On a blocked navigate the router will:
detect challenge -> call local service -> Chrome obtains cookies -> cookies_set -> retry onceThe service exposes GET /.well-known/unbrowser-cookie-solver and POST /solve, supports the same challenge providers as navigate.challenge, and returns only cookies from the user's local Chrome/unchained session. It does not fabricate challenge tokens. Keep it bound to 127.0.0.1; non-loopback binds are rejected unless --allow-remote-bind is passed because /solve is unauthenticated and can return browser cookies. Use --allow-host for domain allowlisting when desired, and use --no-headless --stealth for sites that reject headless Chrome. Chrome persists across solves by default for the standalone service; pass --no-keep-chrome for one-shot use. Solves are serialized per service process because a service instance owns one CDP port/profile pair.
When installed from the Python package, the same pieces are bundled behind the console wrapper:
unbrowser cookie-service --headless --profile unbrowser-cookie-service
unbrowser router https://example.com/protectedunbrowser router also auto-starts the local cookie service on first challenge when unchained is available and UNBROWSER_COOKIE_SERVICE_URL is not set. --allow-host example.com allows example.com and its subdomains only; broad single-label suffixes like com are rejected. Without an allowlist, the service rejects private/reserved IPs, localhost, and internal single-label hosts by default; use --allow-host to opt in to a specific internal host for local testing. Router refuses non-loopback UNBROWSER_COOKIE_SERVICE_URL values by default because it posts target URLs and challenge metadata to that service; pass --allow-remote-cookie-service only for a trusted remote solver.
SPA-detection diagnostics
Every navigate's blockmap.density field signals SPA-ness so agents bail before wasting round-trips:
thin_shell: true— page is < 4KB body text with no headings or interactives (typical React/Ember root). For HTTP errors (status >= 400), shell signals are suppressed andhttp_error_statusis attached so a 404 is not mistaken for an SPA.likely_js_filled: true— table/list/cell shells are empty, or the page has many scripts with little visible UI (CNBC / YouTube-class trap)json_scripts: N— count of<script type="application/json">(often holds the data the JS would render — tryeval()on those before escalating)script_heavy_shell: true— many scripts, little text, few links; usually browser-rendered UI rather than useful SSR
Three ways agents talk to it
Session CLI (persistent shell workflow)
When an agent can only shell out but needs incremental state, start a local daemon-backed session. Cookies, DOM, JS globals, and element refs persist until stop:
unbrowser session start --id golf
unbrowser exec golf navigate https://news.ycombinator.com
unbrowser exec golf query '.titleline > a'
unbrowser exec --pretty golf blockmap
unbrowser exec golf eval 'document.title'
unbrowser session stop golfsession exec and the shorter exec alias accept either shorthand args for common methods or a raw JSON params object:
unbrowser exec golf navigate https://example.com --exec-scripts
unbrowser exec golf query_debug '.product-card' --limit 5
unbrowser exec golf extract_cards '{"kind":"product","limit":20}'
unbrowser session prune # remove dead socketsMCP (no glue)
{
"mcpServers": {
"unchained": {
"command": "unbrowser",
"args": ["--mcp"]
}
}
}Tools are auto-discovered by Claude Code, Claude Desktop, Cursor, Cline.
Subprocess (custom runtimes)
13 lines of Python (above). Or any language with subprocess + JSON.
Auto-escalation router (scripts/router.py)
from scripts.router import Router, RouterConfig, cached_cookies_solver
with Router(RouterConfig(
binary="./target/release/unbrowser",
chrome_solver=cached_cookies_solver("cookies.json"),
)) as r:
r.navigate("https://www.zillow.com/homes/for_rent/") # auto-handles 403 + cookie replayLive event watcher (scripts/watch.py)
The binary emits NDJSON events (ready, navigate, challenge) on stderr. Pipe them through watch.py for color-coded one-liners:
unbrowser 2> >(python3 scripts/watch.py)RPC methods
| fetch + parse + return |
| CSS query → |
| explain selector misses: match count, sample matches, DOM summary, top tags/classes/data attrs/ids, and hints like |
| textContent of FIRST match (default |
| textContent of |
| Cheap-first information discovery. Merges DOM routes, inferred form/query URLs, and network JSON routes into one ranked graph with provenance and escalation hints. Defaults to static discovery; set |
| auto-detect repeated product/listing/article cards and return normalized fields including |
| normalize an HTML table into headers, rows, and row count. |
| dispatch click; auto-follows |
| set value + dispatch input/change events |
| gather form fields and navigate. Supports GET and |
| run JS in embedded QuickJS. Raw JSON-RPC also accepts |
| session jar |
| recompute the page summary |
| raw HTML of last navigation |
blockmap.selectors surfaces concrete selector hints for the current page (data-testid, aria-label, role) so agents can bias toward query or query_text without guessing.
discover is the route-finding layer to use before extraction when you need to learn where information lives. By default it returns compact navigate_summary, route_discover_summary, and network_extract_summary fields plus the merged routes, forms, api_endpoints, network_sources, and escalations. Pass debug: true only when you need the full nested navigate, route_discover, and network_extract payloads for diagnosis. limit must be between 1 and 200; invalid url / limit inputs fail at the RPC boundary.
{"jsonrpc":"2.0","id":1,"method":"discover","params":{"url":"https://example.com","goal":"find pricing docs api status","same_origin":true,"limit":25}}Use exec_scripts: true as an opt-in second pass for pages whose static HTML does not expose enough routes. In that mode, routes already present before scripts are labeled static_dom; routes that only appear after JavaScript/timers/fetches are labeled js_dom.
CSS selector engine: tag, id, class, [attr=val] (also ^=, $=, *=, ~=), all four combinators ( , >, +, ~), :first/last/nth-child/of-type including An+B formulas, :only-child/of-type, :not(), and :has().
When to escalate to real Chrome
This binary is the cheap path. For the cases it can't handle (heavy framework hydration, behavioral bot challenges, Workers/Canvas/IndexedDB), the next tier is a real Chrome instance driven via CDP. Two ways to get there:
This binary | |||
Runs JS | QuickJS (no V8 JIT) | real Chrome via CDP | real Chrome (the user's, with their logins) |
SPA hydration | partial | ✅ | ✅ |
Bot challenges | cookie handoff only | active solving via real browser | manual / interactive |
Setup |
|
| desktop install |
Audience | agent / pipeline | agent / pipeline | end user |
Per-page footprint | ~50MB | full Chrome | full Chrome |
The escalation path is a deliberate choice, not an automatic fallback — you ship pyunbrowser for the 80% of pages that work cheap, then route the 20% to unchainedsky-cli (or to a human via the desktop app). The vocabulary (navigate, query, click, cookies_set, BlockMap) is shared so code transfers cleanly.
Honest limits
Script execution is opt-in via
exec_scripts: true. Default navigate skips it (the SSR/static path is what most agents want). With it on, inline + external<script>tags run in QuickJS — works for many SPAs, but heavy framework bootstraps (Ember, big React) often don't auto-mount because shims can't fake every browser-specific signal. The blockmap'sdensity.likely_js_filledflag tells agents in one call when to escalate instead of burning round-trips.All eval is wall-clock bounded. A 30s watchdog (configurable via
UNBROWSER_TIMEOUT_MS, clamped to 1s..10min) covers script execution AND every subsequent settle/microtask/timer callback, so a hostile site can never wedge the binary or strand a CPU-pegged orphan process.Form submit is intentionally narrow. GET and
application/x-www-form-urlencodedPOST are supported, including checked checkbox/radio values. Multipart upload forms are not supported — construct the request manually viaevalor escalate.Hardest-tier bot detection (PerimeterX with behavioral telemetry, advanced Akamai BMP, Kasada) needs the cookie-handoff path. The binary detects and labels the challenge for you, but solving it requires real Chrome (or a token vendor).
No screenshots. Out of scope by design.
Build
Rust 1.95+ via rustup. On macOS, also brew install cmake ninja (BoringSSL dependency).
cargo build --release~2 min first build (BoringSSL compiles), instant after.
Architecture in one diagram
JSON-RPC stdin ─┐ ┌─ stdout
▼ ▲
┌────────────────────┐
│ request (Chrome │ ┌──────────┐ ┌──────────────────┐
│ TLS+H2 fingerprint)├──▶ html5ever ├───▶ rquickjs + │
│ │ │ parser │ │ dom.js + │
│ cookie_store │ └──────────┘ │ blockmap.js + │
│ (jar) │ │ interact.js │
└────────────────────┘ └──────────────────┘License
Apache 2.0 — see LICENSE.
For the cases this binary can't handle (heavy framework hydration, behavioral bot challenges, anything needing real Chrome), the next tier is unchainedsky-cli — drives a real Chrome via CDP, same vocabulary. End-users who want a point-and-click agent can skip the CLI entirely and use the Unchained desktop app.
Maintenance
Latest Blog Posts
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/protostatis/unbrowser'
If you have feedback or need assistance with the MCP directory API, please join our Discord server