webscraping_ai_html
Extract HTML content from any web page with support for JavaScript rendering, proxy selection, and custom timeouts. Use it for web scraping and data extraction.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| url | Yes | URL of the target page. | |
| return_script_result | No | Return result of the custom JavaScript code execution. | |
| format | No | Response format (json or text). | |
| timeout | No | Maximum web page retrieval time in ms (20000 by default, maximum is 30000). | |
| js | No | Execute on-page JavaScript using a headless browser (false by default). | |
| js_timeout | No | Maximum JavaScript rendering time in ms (3000 by default). | |
| wait_for | No | CSS selector to wait for before returning the page content. | |
| proxy | No | Type of proxy: datacenter, residential, or stealth (datacenter by default). Use residential if the site restricts datacenter traffic, or stealth for the most heavily protected sites with advanced anti-bot detection. Residential and stealth requests cost more than datacenter — see the pricing page. | datacenter |
| country | No | Country of the proxy to use (US by default). | |
| custom_proxy | No | Your own proxy URL in "http://user:password@host:port" format. | |
| device | No | Type of device emulation. | |
| error_on_404 | No | Return error on 404 HTTP status on the target page (false by default). | |
| error_on_redirect | No | Return error on redirect on the target page (false by default). | |
| js_script | No | Custom JavaScript code to execute on the target page. |
Implementation Reference
- src/index.js:265-283 (handler)Handler function for the 'webscraping_ai_html' tool. It calls client.html() with the URL and options, then returns the result as either JSON ({html: result}) or plain text based on the 'format' parameter. On error, it parses the error message and returns a sanitized error response.
server.tool( 'webscraping_ai_html', { url: z.string().describe('URL of the target page.'), return_script_result: z.boolean().optional().describe('Return result of the custom JavaScript code execution.'), format: z.enum(['json', 'text']).optional().describe('Response format (json or text).'), ...commonOptionsSchema }, async ({ url, return_script_result, format, ...options }) => { try { const result = await client.html(url, { ...options, return_script_result }); const content = format === 'json' ? JSON.stringify({ html: result }) : result; return createSanitizedResponse(content, url); } catch (error) { const errorObj = JSON.parse(error.message); return createSanitizedResponse(JSON.stringify(errorObj), url, true); } } ); - src/index.js:267-272 (schema)Input schema for the 'webscraping_ai_html' tool, defining parameters: url (required string), return_script_result (optional boolean), format (optional enum 'json'|'text'), plus common options spread from commonOptionsSchema.
{ url: z.string().describe('URL of the target page.'), return_script_result: z.boolean().optional().describe('Return result of the custom JavaScript code execution.'), format: z.enum(['json', 'text']).optional().describe('Response format (json or text).'), ...commonOptionsSchema }, - src/index.js:265-283 (registration)Registration of the 'webscraping_ai_html' tool on the MCP server via server.tool() call.
server.tool( 'webscraping_ai_html', { url: z.string().describe('URL of the target page.'), return_script_result: z.boolean().optional().describe('Return result of the custom JavaScript code execution.'), format: z.enum(['json', 'text']).optional().describe('Response format (json or text).'), ...commonOptionsSchema }, async ({ url, return_script_result, format, ...options }) => { try { const result = await client.html(url, { ...options, return_script_result }); const content = format === 'json' ? JSON.stringify({ html: result }) : result; return createSanitizedResponse(content, url); } catch (error) { const errorObj = JSON.parse(error.message); return createSanitizedResponse(JSON.stringify(errorObj), url, true); } } ); - src/index.js:90-95 (helper)The client.html() helper method on the WebScrapingAIClient class that makes an API request to the '/html' endpoint, passing url and options along with the API key.
async html(url, options = {}) { return this.request('/html', { url, ...options }); } - src/index.js:51-72 (helper)The base request() helper used by client.html() to perform the actual HTTP GET request with queuing, API key injection, and error handling.
async request(endpoint, params) { try { return await this.queue.add(async () => { const response = await this.client.get(endpoint, { params: { ...params, api_key: this.apiKey, from_mcp_server: true } }); return response.data; }); } catch (error) { const errorResponse = { message: 'API Error', status_code: error.response?.status, status_message: error.response?.statusText, body: error.response?.data }; throw new Error(JSON.stringify(errorResponse)); } }