scrape_url
Extract content from web pages using JavaScript rendering, antibot bypass, and captcha solving to access protected sites and SPAs.
Instructions
Scrape content from any web page with advanced antibot bypass.
Features:
TLS fingerprinting (Chrome, Firefox, Safari profiles)
JavaScript rendering for SPAs (React, Vue, Angular)
Captcha solving (reCAPTCHA, hCaptcha, Cloudflare Turnstile)
Residential and mobile proxy support
Automatic retry with smart detection
Use cases:
Extract text/HTML from any website
Scrape JavaScript-rendered content
Access pages behind Cloudflare or other protections
Get data from pages with captchas
Token costs:
Base request: 1 token
Antibot bypass: +2 tokens
JS rendering: +5 tokens
Residential proxy: +3 tokens
Captcha solving: +10 tokens
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| url | Yes | Target URL to scrape (required) | |
| use_js_render | No | Enable JavaScript rendering with Playwright. Use for SPAs, React, Vue sites. Default: false | |
| use_residential | No | Use residential proxy instead of datacenter. Better for protected sites. Default: false | |
| use_undetected | No | Use Undetected Chrome for maximum antibot bypass (Cloudflare, PerimeterX). Default: false | |
| solve_captcha | No | Automatically detect and solve captchas. Default: false | |
| timeout | No | Timeout in seconds (5-300). Default: 60 | |
| js_wait_for | No | Wait strategy for JS rendering: 'networkidle', 'load', 'domcontentloaded', or 'selector:.css-selector'. Default: networkidle | networkidle |
| session_id | No | Sticky session ID - all requests with same ID use same proxy IP. Good for auth flows. |