web_scrape
Extract structured data from websites by scraping URLs with customizable HTTP methods, headers, proxies, JavaScript rendering, and anti-bot protection.
Instructions
Scrape a URL with full control. Use tool scraping_instruction_enhanced before using this tool. Prefer web_get_page for quick fetch
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| url | Yes | The target URL to scrape. | |
| method | No | The HTTP method to use for the request. | GET |
| body | No | Request body for POST/PUT/PATCH requests. | |
| headers | No | HTTP headers to send. | |
| country | No | The country to use for the proxy. Supports ISO 3166-1 alpha-2 country codes. | |
| proxy_pool | No | The proxy pool to use. Supports public_datacenter_pool and public_residential_pool, defaults: public_datacenter_pool | public_datacenter_pool |
| render_js | No | Enable JavaScript rendering with a headless browser. | |
| rendering_wait | No | Wait for this number of milliseconds before returning the response. | |
| asp | No | Enable Anti Scraping Protection. | |
| cache | No | Enable caching of the response. | |
| cache_ttl | No | Cache TTL in seconds when cache is true. | |
| cache_clear | No | If true, bypass & clear cache for this URL. | |
| retry | No | If false, disable automatic retry on transient errors. | |
| wait_for_selector | No | (Prefer rendering_wait). Wait for this CSS selector to appear in the page when rendering JS. | |
| lang | No | Languages to use for the request (Accept-Language header). Empty for auto-detection/Proxy Location alignment | |
| cookies | No | Cookies to send with the request. | |
| format | No | The desired output format for the content. Supports clean_html, markdown, text, and json | markdown |
| format_options | No | Additional options (only available for markdown and text formats) | |
| js | No | JavaScript to execute on the page. | |
| js_scenario | No | A schema for validating a sequence of browser actions (JS Scenario) for the Scrapfly API. | |
| screenshots | No | Screenshots with target (fullpage, selector). Example: [{ 'name': 'my_screenshot', 'target': 'fullpage' }, { 'name': 'my_screenshot2', 'target': 'selector', 'css_selector': '#price' }] | |
| screenshot_flags | No | Screenshot flags to use for the screenshot. | |
| timeout | No | Server-side timeout in milliseconds. (Prefer rendering_wait + timeout) | |
| extraction_prompt | No | (Avoid if the llm is thinking and can process the data itself). If data extraction cannot be assumed by the current llm model,AI prompt to add step of llm assisted data extraction. | |
| extraction_model | No | The extraction model to use for the offloaded extraction. Exclusive with extraction_template and extraction_prompt. | |
| pow | Yes | use scraping_instruction_enhanced tool use for instructions |