webscraping_ai_html

Extract HTML content from any web page with support for JavaScript rendering, proxy selection, and custom timeouts. Use it for web scraping and data extraction.

Input Schema

TableJSON Schema

Name	Required	Description	Default
`url`	Yes	URL of the target page.
`return_script_result`	No	Return result of the custom JavaScript code execution.
`format`	No	Response format (json or text).
`timeout`	No	Maximum web page retrieval time in ms (20000 by default, maximum is 30000).
`js`	No	Execute on-page JavaScript using a headless browser (false by default).
`js_timeout`	No	Maximum JavaScript rendering time in ms (3000 by default).
`wait_for`	No	CSS selector to wait for before returning the page content.
`proxy`	No	Type of proxy: datacenter, residential, or stealth (datacenter by default). Use residential if the site restricts datacenter traffic, or stealth for the most heavily protected sites with advanced anti-bot detection. Residential and stealth requests cost more than datacenter — see the pricing page.	datacenter
`country`	No	Country of the proxy to use (US by default).
`custom_proxy`	No	Your own proxy URL in "http://user:password@host:port" format.
`device`	No	Type of device emulation.
`error_on_404`	No	Return error on 404 HTTP status on the target page (false by default).
`error_on_redirect`	No	Return error on redirect on the target page (false by default).
`js_script`	No	Custom JavaScript code to execute on the target page.

Implementation Reference

src/index.js:265-283 (handler)

Handler function for the 'webscraping_ai_html' tool. It calls client.html() with the URL and options, then returns the result as either JSON ({html: result}) or plain text based on the 'format' parameter. On error, it parses the error message and returns a sanitized error response.

server.tool(
  'webscraping_ai_html',
  {
    url: z.string().describe('URL of the target page.'),
    return_script_result: z.boolean().optional().describe('Return result of the custom JavaScript code execution.'),
    format: z.enum(['json', 'text']).optional().describe('Response format (json or text).'),
    ...commonOptionsSchema
  },
  async ({ url, return_script_result, format, ...options }) => {
    try {
      const result = await client.html(url, { ...options, return_script_result });
      const content = format === 'json' ? JSON.stringify({ html: result }) : result;
      return createSanitizedResponse(content, url);
    } catch (error) {
      const errorObj = JSON.parse(error.message);
      return createSanitizedResponse(JSON.stringify(errorObj), url, true);
    }
  }
);

src/index.js:267-272 (schema)

Input schema for the 'webscraping_ai_html' tool, defining parameters: url (required string), return_script_result (optional boolean), format (optional enum 'json'|'text'), plus common options spread from commonOptionsSchema.

{
  url: z.string().describe('URL of the target page.'),
  return_script_result: z.boolean().optional().describe('Return result of the custom JavaScript code execution.'),
  format: z.enum(['json', 'text']).optional().describe('Response format (json or text).'),
  ...commonOptionsSchema
},

src/index.js:265-283 (registration)

Registration of the 'webscraping_ai_html' tool on the MCP server via server.tool() call.

server.tool(
  'webscraping_ai_html',
  {
    url: z.string().describe('URL of the target page.'),
    return_script_result: z.boolean().optional().describe('Return result of the custom JavaScript code execution.'),
    format: z.enum(['json', 'text']).optional().describe('Response format (json or text).'),
    ...commonOptionsSchema
  },
  async ({ url, return_script_result, format, ...options }) => {
    try {
      const result = await client.html(url, { ...options, return_script_result });
      const content = format === 'json' ? JSON.stringify({ html: result }) : result;
      return createSanitizedResponse(content, url);
    } catch (error) {
      const errorObj = JSON.parse(error.message);
      return createSanitizedResponse(JSON.stringify(errorObj), url, true);
    }
  }
);

src/index.js:90-95 (helper)
The client.html() helper method on the WebScrapingAIClient class that makes an API request to the '/html' endpoint, passing url and options along with the API key.
```
async html(url, options = {}) {
  return this.request('/html', {
    url,
    ...options
  });
}
```

src/index.js:51-72 (helper)

The base request() helper used by client.html() to perform the actual HTTP GET request with queuing, API key injection, and error handling.

async request(endpoint, params) {
  try {
    return await this.queue.add(async () => {
      const response = await this.client.get(endpoint, { 
        params: {
          ...params,
          api_key: this.apiKey,
          from_mcp_server: true
        }
      });
      return response.data;
    });
  } catch (error) {
    const errorResponse = {
      message: 'API Error',
      status_code: error.response?.status,
      status_message: error.response?.statusText,
      body: error.response?.data
    };
    throw new Error(JSON.stringify(errorResponse));
  }
}

WebScraping-AI MCP Server

webscraping_ai_html

Input Schema

Implementation Reference

Tool Definition Quality

Other Tools

Latest Blog Posts

MCP directory API