webscraping_ai_fields
Extract structured data fields from webpages using custom extraction instructions. Supports JavaScript rendering, proxy selection, and configurable timeouts.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| url | Yes | URL of the target page. | |
| fields | Yes | Dictionary of field names with instructions for extraction. | |
| timeout | No | Maximum web page retrieval time in ms (20000 by default, maximum is 30000). | |
| js | No | Execute on-page JavaScript using a headless browser (false by default). | |
| js_timeout | No | Maximum JavaScript rendering time in ms (3000 by default). | |
| wait_for | No | CSS selector to wait for before returning the page content. | |
| proxy | No | Type of proxy, datacenter or residential (datacenter by default). | datacenter |
| country | No | Country of the proxy to use (US by default). | |
| custom_proxy | No | Your own proxy URL in "http://user:password@host:port" format. | |
| device | No | Type of device emulation. | |
| error_on_404 | No | Return error on 404 HTTP status on the target page (false by default). | |
| error_on_redirect | No | Return error on redirect on the target page (false by default). | |
| js_script | No | Custom JavaScript code to execute on the target page. |
Implementation Reference
- src/index.js:248-263 (registration)Registration of the 'webscraping_ai_fields' tool via server.tool() - defines name, schema, and handler.
server.tool( 'webscraping_ai_fields', { url: z.string().describe('URL of the target page.'), fields: z.record(z.string()).describe('Dictionary of field names with instructions for extraction.'), ...commonOptionsSchema }, async ({ url, fields, ...options }) => { try { const result = await client.fields(url, fields, options); return createSanitizedResponse(JSON.stringify(result, null, 2), url); } catch (error) { return createSanitizedResponse(error.message, url, true); } } ); - src/index.js:249-254 (schema)Input schema for the tool: url (string), fields (record of strings for field name -> extraction instructions), and common options like timeout, js, proxy, etc.
'webscraping_ai_fields', { url: z.string().describe('URL of the target page.'), fields: z.record(z.string()).describe('Dictionary of field names with instructions for extraction.'), ...commonOptionsSchema }, - src/index.js:255-263 (handler)Handler function that calls client.fields(url, fields, options) and returns result as formatted JSON string.
async ({ url, fields, ...options }) => { try { const result = await client.fields(url, fields, options); return createSanitizedResponse(JSON.stringify(result, null, 2), url); } catch (error) { return createSanitizedResponse(error.message, url, true); } } ); - src/index.js:82-88 (helper)The client.fields() method in WebScrapingAIClient class - makes the actual API call to /ai/fields endpoint, stringifying the fields object.
async fields(url, fields, options = {}) { return this.request('/ai/fields', { url, fields: JSON.stringify(fields), ...options }); } - src/index.js:51-72 (helper)Generic request() method used by all client methods, handles queuing via PQueue and API key injection.
async request(endpoint, params) { try { return await this.queue.add(async () => { const response = await this.client.get(endpoint, { params: { ...params, api_key: this.apiKey, from_mcp_server: true } }); return response.data; }); } catch (error) { const errorResponse = { message: 'API Error', status_code: error.response?.status, status_message: error.response?.statusText, body: error.response?.data }; throw new Error(JSON.stringify(errorResponse)); } }