scrape_as_markdown
Extract webpage content in Markdown format while bypassing bot detection and CAPTCHA protection for reliable data collection.
Instructions
Scrape a single webpage URL with advanced options for content extraction and get back the results in MarkDown language. This tool can unlock any webpage even if it uses bot detection or CAPTCHA.
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| url | Yes |
Implementation Reference
- server.js:161-183 (registration)Registration of the 'scrape_as_markdown' tool using server.addTool(), including name, description, Zod schema for parameters, and execute handler wrapped by tool_fn.addTool({ name: 'scrape_as_markdown', description: 'Scrape a single webpage URL with advanced options for ' +'content extraction and get back the results in MarkDown language. ' +'This tool can unlock any webpage even if it uses bot detection or ' +'CAPTCHA.', parameters: z.object({url: z.string().url()}), execute: tool_fn('scrape_as_markdown', async({url})=>{ let response = await axios({ url: 'https://api.brightdata.com/request', method: 'POST', data: { url, zone: unlocker_zone, format: 'raw', data_format: 'markdown', }, headers: api_headers(), responseType: 'text', }); return response.data; }), });
- server.js:168-182 (handler)The core handler function for 'scrape_as_markdown' that sends a POST request to BrightData API to scrape the given URL and return markdown content.execute: tool_fn('scrape_as_markdown', async({url})=>{ let response = await axios({ url: 'https://api.brightdata.com/request', method: 'POST', data: { url, zone: unlocker_zone, format: 'raw', data_format: 'markdown', }, headers: api_headers(), responseType: 'text', }); return response.data; }),
- server.js:167-167 (schema)Zod schema defining the input parameter 'url' as a required URL string.parameters: z.object({url: z.string().url()}),
- server.js:752-777 (helper)Wrapper function 'tool_fn' used in execute for all tools, providing rate limiting, stats tracking, logging, and error handling.function tool_fn(name, fn){ return async(data, ctx)=>{ check_rate_limit(); debug_stats.tool_calls[name] = debug_stats.tool_calls[name]||0; debug_stats.tool_calls[name]++; debug_stats.session_calls++; let ts = Date.now(); console.error(`[%s] executing %s`, name, JSON.stringify(data)); try { return await fn(data, ctx); } catch(e){ if (e.response) { console.error(`[%s] error %s %s: %s`, name, e.response.status, e.response.statusText, e.response.data); let message = e.response.data; if (message?.length) throw new Error(`HTTP ${e.response.status}: ${message}`); } else console.error(`[%s] error %s`, name, e.stack); throw e; } finally { let dur = Date.now()-ts; console.error(`[%s] tool finished in %sms`, name, dur); } };