webscraping_ai_text
Extract text content from any webpage with options for plain text, XML, JSON, links, JavaScript execution, and proxy settings.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| url | Yes | URL of the target page. | |
| text_format | No | Format of the text response. | json |
| return_links | No | Return links from the page body text. | |
| timeout | No | Maximum web page retrieval time in ms (20000 by default, maximum is 30000). | |
| js | No | Execute on-page JavaScript using a headless browser (false by default). | |
| js_timeout | No | Maximum JavaScript rendering time in ms (3000 by default). | |
| wait_for | No | CSS selector to wait for before returning the page content. | |
| proxy | No | Type of proxy: datacenter, residential, or stealth (datacenter by default). Use residential if the site restricts datacenter traffic, or stealth for the most heavily protected sites with advanced anti-bot detection. Residential and stealth requests cost more than datacenter — see the pricing page. | datacenter |
| country | No | Country of the proxy to use (US by default). | |
| custom_proxy | No | Your own proxy URL in "http://user:password@host:port" format. | |
| device | No | Type of device emulation. | |
| error_on_404 | No | Return error on 404 HTTP status on the target page (false by default). | |
| error_on_redirect | No | Return error on redirect on the target page (false by default). | |
| js_script | No | Custom JavaScript code to execute on the target page. |