scrape_by_selector
Extract specific webpage content using CSS selectors for static or dynamic elements to support development workflows.
Instructions
Scrape content using CSS selector
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| url | Yes | URL to scrape | |
| selector | Yes | CSS selector | |
| useBrowser | No | Use browser for dynamic content |
Implementation Reference
- src/scrapers/static-scraper.ts:150-182 (handler)Core handler function that fetches the HTML using axios, parses it with cheerio, and extracts trimmed text from all elements matching the CSS selector.async scrapeBySelector(config: ScrapingConfig, selector: string): Promise<string[]> { if (!Validators.isValidSelector(selector)) { throw new Error('Invalid CSS selector'); } const validation = Validators.validateScrapingConfig(config); if (!validation.valid) { throw new Error(`Invalid scraping config: ${validation.errors.join(', ')}`); } try { const response = await axios.get(config.url, { headers: config.headers || { 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36', }, timeout: config.timeout || 30000, }); const $ = cheerio.load(response.data); const results: string[] = []; $(selector).each((_, element) => { const text = $(element).text().trim(); if (text) { results.push(text); } }); return results; } catch (error) { throw new Error(`Failed to scrape: ${error instanceof Error ? error.message : String(error)}`); } }
- src/tools/web-scraping.ts:112-134 (registration)Registers the 'scrape_by_selector' tool in the webScrapingTools array, including name, description, and input schema.{ name: 'scrape_by_selector', description: 'Scrape content using CSS selector', inputSchema: { type: 'object', properties: { url: { type: 'string', description: 'URL to scrape', }, selector: { type: 'string', description: 'CSS selector', }, useBrowser: { type: 'boolean', description: 'Use browser for dynamic content', default: false, }, }, required: ['url', 'selector'], }, },
- src/tools/web-scraping.ts:329-339 (handler)Tool handler case in handleWebScrapingTool that dispatches to staticScraper.scrapeBySelector for non-browser scraping or returns a stub for browser mode.case 'scrape_by_selector': { const selector = params.selector as string; if (config.useBrowser) { // For browser, we'd need to use page.evaluate await dynamicScraper.scrapeDynamicContent(config); // Simplified - would extract by selector in real implementation return { message: 'Selector extraction with browser requires page.evaluate', selector }; } else { return await staticScraper.scrapeBySelector(config, selector); } }
- src/tools/web-scraping.ts:115-133 (schema)JSON schema defining the input parameters for the scrape_by_selector tool.inputSchema: { type: 'object', properties: { url: { type: 'string', description: 'URL to scrape', }, selector: { type: 'string', description: 'CSS selector', }, useBrowser: { type: 'boolean', description: 'Use browser for dynamic content', default: false, }, }, required: ['url', 'selector'], },