Skip to main content
Glama

scrape_as_markdown

Extract webpage content in Markdown format while bypassing bot detection and CAPTCHA protection for reliable data collection.

Instructions

Scrape a single webpage URL with advanced options for content extraction and get back the results in MarkDown language. This tool can unlock any webpage even if it uses bot detection or CAPTCHA.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
urlYes

Implementation Reference

  • server.js:161-183 (registration)
    Registration of the 'scrape_as_markdown' tool using server.addTool(), including name, description, Zod schema for parameters, and execute handler wrapped by tool_fn.
    addTool({ name: 'scrape_as_markdown', description: 'Scrape a single webpage URL with advanced options for ' +'content extraction and get back the results in MarkDown language. ' +'This tool can unlock any webpage even if it uses bot detection or ' +'CAPTCHA.', parameters: z.object({url: z.string().url()}), execute: tool_fn('scrape_as_markdown', async({url})=>{ let response = await axios({ url: 'https://api.brightdata.com/request', method: 'POST', data: { url, zone: unlocker_zone, format: 'raw', data_format: 'markdown', }, headers: api_headers(), responseType: 'text', }); return response.data; }), });
  • The core handler function for 'scrape_as_markdown' that sends a POST request to BrightData API to scrape the given URL and return markdown content.
    execute: tool_fn('scrape_as_markdown', async({url})=>{ let response = await axios({ url: 'https://api.brightdata.com/request', method: 'POST', data: { url, zone: unlocker_zone, format: 'raw', data_format: 'markdown', }, headers: api_headers(), responseType: 'text', }); return response.data; }),
  • Zod schema defining the input parameter 'url' as a required URL string.
    parameters: z.object({url: z.string().url()}),
  • Wrapper function 'tool_fn' used in execute for all tools, providing rate limiting, stats tracking, logging, and error handling.
    function tool_fn(name, fn){ return async(data, ctx)=>{ check_rate_limit(); debug_stats.tool_calls[name] = debug_stats.tool_calls[name]||0; debug_stats.tool_calls[name]++; debug_stats.session_calls++; let ts = Date.now(); console.error(`[%s] executing %s`, name, JSON.stringify(data)); try { return await fn(data, ctx); } catch(e){ if (e.response) { console.error(`[%s] error %s %s: %s`, name, e.response.status, e.response.statusText, e.response.data); let message = e.response.data; if (message?.length) throw new Error(`HTTP ${e.response.status}: ${message}`); } else console.error(`[%s] error %s`, name, e.stack); throw e; } finally { let dur = Date.now()-ts; console.error(`[%s] tool finished in %sms`, name, dur); } };

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/dsouza-anush/brightdata-mcp-heroku'

If you have feedback or need assistance with the MCP directory API, please join our Discord server