Skip to main content
Glama

scrape_dynamic_content

Extract JavaScript-rendered content from web pages by simulating browser interaction, enabling developers to access dynamically loaded data for analysis or integration.

Instructions

Scrape JavaScript-rendered content using browser

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
urlYesURL to scrape
waitForSelectorNoCSS selector to wait for
waitForTimeoutNoTimeout in milliseconds
timeoutNoPage load timeout

Implementation Reference

  • Core implementation of the scrape_dynamic_content tool handler using Playwright to scrape dynamic JavaScript-rendered web content, extracting title, text, HTML, links, images, and tables.
    async scrapeDynamicContent(config: ScrapingConfig): Promise<ScrapedData> { const validation = Validators.validateScrapingConfig(config); if (!validation.valid) { throw new Error(`Invalid scraping config: ${validation.errors.join(', ')}`); } const browser = await this.getBrowser(); const page = await browser.newPage(); try { // Set headers if provided if (config.headers) { await page.setExtraHTTPHeaders(config.headers); } // Navigate to URL await page.goto(config.url, { waitUntil: 'networkidle', timeout: config.timeout || 30000, }); // Wait for selector if specified if (config.waitForSelector) { await page.waitForSelector(config.waitForSelector, { timeout: config.waitForTimeout || 10000, }); } // Wait for additional time if specified if (config.waitFor) { await page.waitForTimeout(parseInt(config.waitFor) || 1000); } // Extract content const title = await page.title(); const text = await page.evaluate(() => { return document.body.innerText.replace(/\s+/g, ' ').trim(); }); const html = await page.content(); // Extract links const links = await page.evaluate(() => { const linkElements = Array.from(document.querySelectorAll('a[href]')); return linkElements .map((el) => { try { return new URL((el as HTMLAnchorElement).href).href; } catch { return null; } }) .filter((url): url is string => url !== null); }); // Extract images const images = await page.evaluate(() => { const imgElements = Array.from(document.querySelectorAll('img[src]')); return imgElements .map((el) => { try { return new URL((el as HTMLImageElement).src).href; } catch { return null; } }) .filter((url): url is string => url !== null); }); // Extract tables const tables = await page.evaluate((): TableData[] => { const tableElements = Array.from(document.querySelectorAll('table')); return tableElements.map((table: Element) => { const tableData: TableData = { headers: [], rows: [], }; // Extract caption const caption = table.querySelector('caption'); if (caption) { tableData.caption = caption.textContent?.trim() || ''; } // Extract headers const headerCells = table.querySelectorAll('thead th, thead td, tr:first-child th, tr:first-child td'); headerCells.forEach((cell: Element) => { tableData.headers.push(cell.textContent?.trim() || ''); }); // Extract rows const rows = table.querySelectorAll('tbody tr, tr'); rows.forEach((row: Element, index: number) => { // Skip first row if it's used as headers if (index === 0 && tableData.headers.length > 0) { return; } const rowData: string[] = []; row.querySelectorAll('td, th').forEach((cell: Element) => { rowData.push(cell.textContent?.trim() || ''); }); if (rowData.length > 0) { tableData.rows.push(rowData); } }); return tableData; }); }); return { url: config.url, title, text, html, links: [...new Set(links)], images: [...new Set(images)], tables, scrapedAt: new Date(), }; } finally { await page.close(); } }
  • Registration of the 'scrape_dynamic_content' tool within the webScrapingTools array, including its name, description, and input schema.
    name: 'scrape_dynamic_content', description: 'Scrape JavaScript-rendered content using browser', inputSchema: { type: 'object', properties: { url: { type: 'string', description: 'URL to scrape', }, waitForSelector: { type: 'string', description: 'CSS selector to wait for', }, waitForTimeout: { type: 'number', description: 'Timeout in milliseconds', }, timeout: { type: 'number', description: 'Page load timeout', default: 30000, }, }, required: ['url'], }, },
  • Input schema for the scrape_dynamic_content tool defining the expected parameters: url (required), waitForSelector, waitForTimeout, and timeout.
    inputSchema: { type: 'object', properties: { url: { type: 'string', description: 'URL to scrape', }, waitForSelector: { type: 'string', description: 'CSS selector to wait for', }, waitForTimeout: { type: 'number', description: 'Timeout in milliseconds', }, timeout: { type: 'number', description: 'Page load timeout', default: 30000, }, }, required: ['url'], },
  • Dispatch handler in handleWebScrapingTool function that calls the DynamicScraper instance's scrapeDynamicContent method and formats the output.
    case 'scrape_dynamic_content': { const data = await dynamicScraper.scrapeDynamicContent(config); return Formatters.formatScrapedData(data); }

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/code-alchemist01/development-tools-mcp-Server'

If you have feedback or need assistance with the MCP directory API, please join our Discord server