scrape_as_html
Extract HTML content from webpages by bypassing bot detection and CAPTCHA protections for data collection and analysis.
Instructions
Scrape a single webpage URL with advanced options for content extraction and get back the results in HTML. This tool can unlock any webpage even if it uses bot detection or CAPTCHA.
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| url | Yes |
Implementation Reference
- server.js:191-204 (handler)The handler function wrapped by tool_fn that performs an axios POST request to the Bright Data API (/request endpoint) using the unlocker zone to scrape the provided URL and return the raw HTML response.execute: tool_fn('scrape_as_html', async({url})=>{ let response = await axios({ url: 'https://api.brightdata.com/request', method: 'POST', data: { url, zone: unlocker_zone, format: 'raw', }, headers: api_headers(), responseType: 'text', }); return response.data; }),
- server.js:190-190 (schema)Zod schema for the tool input, requiring a single 'url' parameter validated as a URL string.parameters: z.object({url: z.string().url()}),
- server.js:184-205 (registration)Registration of the 'scrape_as_html' tool using server.addTool (via addTool helper), defining name, description, input schema, and execute handler.addTool({ name: 'scrape_as_html', description: 'Scrape a single webpage URL with advanced options for ' +'content extraction and get back the results in HTML. ' +'This tool can unlock any webpage even if it uses bot detection or ' +'CAPTCHA.', parameters: z.object({url: z.string().url()}), execute: tool_fn('scrape_as_html', async({url})=>{ let response = await axios({ url: 'https://api.brightdata.com/request', method: 'POST', data: { url, zone: unlocker_zone, format: 'raw', }, headers: api_headers(), responseType: 'text', }); return response.data; }), });