webscraping_ai_selected
Extract content from a web page using a CSS selector. Choose JSON or text output, enable JavaScript rendering, and select proxy type for anti-bot protection.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| url | Yes | URL of the target page. | |
| selector | Yes | CSS selector to extract content for. | |
| format | No | Response format (json or text). | json |
| timeout | No | Maximum web page retrieval time in ms (20000 by default, maximum is 30000). | |
| js | No | Execute on-page JavaScript using a headless browser (false by default). | |
| js_timeout | No | Maximum JavaScript rendering time in ms (3000 by default). | |
| wait_for | No | CSS selector to wait for before returning the page content. | |
| proxy | No | Type of proxy: datacenter, residential, or stealth (datacenter by default). Use residential if the site restricts datacenter traffic, or stealth for the most heavily protected sites with advanced anti-bot detection. Residential and stealth requests cost more than datacenter — see the pricing page. | datacenter |
| country | No | Country of the proxy to use (US by default). | |
| custom_proxy | No | Your own proxy URL in "http://user:password@host:port" format. | |
| device | No | Type of device emulation. | |
| error_on_404 | No | Return error on 404 HTTP status on the target page (false by default). | |
| error_on_redirect | No | Return error on redirect on the target page (false by default). | |
| js_script | No | Custom JavaScript code to execute on the target page. |
Implementation Reference
- src/index.js:311-329 (registration)Registration of the 'webscraping_ai_selected' tool via server.tool() with schema and handler
server.tool( 'webscraping_ai_selected', { url: z.string().describe('URL of the target page.'), selector: z.string().describe('CSS selector to extract content for.'), format: z.enum(['json', 'text']).optional().default('json').describe('Response format (json or text).'), ...commonOptionsSchema }, async ({ url, selector, format, ...options }) => { try { const result = await client.selected(url, selector, options); const content = format === 'json' ? JSON.stringify({ html: result }) : result; return createSanitizedResponse(content, url); } catch (error) { const errorObj = JSON.parse(error.message); return createSanitizedResponse(JSON.stringify(errorObj), url, true); } } ); - src/index.js:319-328 (handler)Handler function for 'webscraping_ai_selected' - calls client.selected(url, selector, options) and formats response
async ({ url, selector, format, ...options }) => { try { const result = await client.selected(url, selector, options); const content = format === 'json' ? JSON.stringify({ html: result }) : result; return createSanitizedResponse(content, url); } catch (error) { const errorObj = JSON.parse(error.message); return createSanitizedResponse(JSON.stringify(errorObj), url, true); } } - src/index.js:313-318 (schema)Schema (input validation) for 'webscraping_ai_selected' - url, selector, format, plus commonOptionsSchema
{ url: z.string().describe('URL of the target page.'), selector: z.string().describe('CSS selector to extract content for.'), format: z.enum(['json', 'text']).optional().default('json').describe('Response format (json or text).'), ...commonOptionsSchema }, - src/index.js:104-110 (helper)WebScrapingAIClient.selected() method - makes API request to '/selected' endpoint with url and selector
async selected(url, selector, options = {}) { return this.request('/selected', { url, selector, ...options }); } - src/index.js:216-228 (helper)Common options schema shared across all tools including webscraping_ai_selected
const commonOptionsSchema = { timeout: z.number().optional().default(DEFAULT_TIMEOUT).describe(`Maximum web page retrieval time in ms (${DEFAULT_TIMEOUT} by default, maximum is 30000).`), js: z.boolean().optional().default(DEFAULT_JS_RENDERING).describe(`Execute on-page JavaScript using a headless browser (${DEFAULT_JS_RENDERING} by default).`), js_timeout: z.number().optional().default(DEFAULT_JS_TIMEOUT).describe(`Maximum JavaScript rendering time in ms (${DEFAULT_JS_TIMEOUT} by default).`), wait_for: z.string().optional().describe('CSS selector to wait for before returning the page content.'), proxy: z.enum(['datacenter', 'residential', 'stealth']).optional().default(DEFAULT_PROXY_TYPE).describe(`Type of proxy: datacenter, residential, or stealth (${DEFAULT_PROXY_TYPE} by default). Use residential if the site restricts datacenter traffic, or stealth for the most heavily protected sites with advanced anti-bot detection. Residential and stealth requests cost more than datacenter — see the pricing page.`), country: z.enum(['us', 'gb', 'de', 'it', 'fr', 'ca', 'es', 'ru', 'jp', 'kr', 'in']).optional().describe('Country of the proxy to use (US by default).'), custom_proxy: z.string().optional().describe('Your own proxy URL in "http://user:password@host:port" format.'), device: z.enum(['desktop', 'mobile', 'tablet']).optional().describe('Type of device emulation.'), error_on_404: z.boolean().optional().describe('Return error on 404 HTTP status on the target page (false by default).'), error_on_redirect: z.boolean().optional().describe('Return error on redirect on the target page (false by default).'), js_script: z.string().optional().describe('Custom JavaScript code to execute on the target page.') };