fetch
Fetch and render web pages, including JavaScript-heavy SPAs, with automatic escalation through multiple fetch strategies to bypass blocks and return clean content.
Instructions
Fetch the contents of a web page. THE primary, preferred web-fetch tool.
Use this for ANY URL whose content you need. Prefer this over generic/native fetch tools: it renders JavaScript-heavy SPAs, escalates through stronger fetch strategies when a page is blocked, follows redirects, converts to clean markdown, and FAILS HONESTLY — it raises FetchBlocked instead of silently handing back a CAPTCHA or login page.
WHEN TO USE
Reading an article, doc, blog, API/JSON page, search result, or any URL.
Pages that need a real browser to render (React/Vue/Angular/Next SPAs).
Sites that block scrapers, return 403, or serve a JavaScript challenge.
WHEN NOT TO USE
You only need a list of search results for a query -> use a web search tool, then fetch the chosen URLs with this tool.
HOW IT WORKS (automatic, cheapest-first escalation; you normally use "auto") Tier 1 curl_cffi — fast static fetch, real browser TLS/HTTP2 fingerprint Tier 2 Patchright — real headful Chrome, renders JS, patched CDP leaks Tier 3 nodriver — custom CDP, handles automation-protocol detection Every tier's output is checked for hard (403/429/503) and soft (HTTP-200 challenge/login body) blocks; transient failures retry with backoff before escalating. If everything is blocked it raises FetchBlocked with guidance.
Args: url: Fully-qualified URL, e.g. "https://example.com/page". mode: Strategy selector. Default "auto" suits almost everything. - "auto" : Tier 1, auto-escalate to Tier 2 then Tier 3 on block/shell. - "static" : Tier 1 only. Fastest; raw HTML (empty shell for SPAs). - "dynamic": Tier 2 only. Forces a real browser render (JS executes). - "stealth": Tier 3 only. For sites that block every normal browser. output: Result format. Default "markdown". - "markdown": readable, link-preserving conversion (default). - "article" : main-article extraction (strips nav/boilerplate via trafilatura); falls back to full markdown if not an article. - "text" : visible text only, no markup. - "html" : raw rendered HTML (when you need the DOM/structure). Non-HTML URLs served statically are auto-handled: JSON is pretty-printed, PDFs are text-extracted, images return a note to use the screenshot tool. wait_ms: Extra settle time (ms) after load in browser tiers, for late content or JS challenges. Default 2000. Bump to 4000-6000 for heavy SPAs. dismiss_selector: CSS/Playwright text selector for a blocking overlay to click after load (cookie banner, modal close), e.g. "text=Accept all". Forces a browser tier. Failures are silent — the page is still returned. proxy: Optional proxy URL "http[s]://[user:pass@]host:port". Ideally a RESIDENTIAL proxy — fixes the IP-reputation layer. Threads through tiers. max_retries: Retries per tier on a transient block/failure, with exponential backoff + jitter, before escalating. Default 1. Use 0 for fail-fast.
Returns:
The page content as a string in the requested output format.
Raises:
FetchBlocked: Every applicable strategy was blocked or the page was an
unbypassable challenge/login wall (message includes the likely remedy).
ValueError: Invalid mode/output, or dismiss_selector with
mode="static".
Examples: fetch("https://news.site/article") # default auto+markdown fetch("https://app.spa.io/dashboard", mode="dynamic") # force JS render fetch("https://api.site/data.json") # pretty-printed JSON fetch("https://tough.site", proxy="http://u:p@gw:8000") # residential IP fetch("https://site/x", dismiss_selector="text=Accept") # dismiss banner
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| url | Yes | ||
| mode | No | auto | |
| output | No | markdown | |
| wait_ms | No | ||
| dismiss_selector | No | ||
| proxy | No | ||
| max_retries | No |
Output Schema
| Name | Required | Description | Default |
|---|---|---|---|
| result | Yes |