extract_tables
Extract structured table data from web pages for analysis, supporting both static content and dynamic JavaScript-rendered pages.
Instructions
Extract table data from a web page
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| url | Yes | URL to scrape | |
| useBrowser | No | Use browser for dynamic content |
Implementation Reference
- src/tools/web-scraping.ts:93-111 (registration)Registration of the 'extract_tables' tool including name, description, and input schema.{ name: 'extract_tables', description: 'Extract table data from a web page', inputSchema: { type: 'object', properties: { url: { type: 'string', description: 'URL to scrape', }, useBrowser: { type: 'boolean', description: 'Use browser for dynamic content', default: false, }, }, required: ['url'], }, },
- src/tools/web-scraping.ts:320-327 (handler)Dispatcher logic in handleWebScrapingTool for the 'extract_tables' tool, routing to static or dynamic scraper based on useBrowser flag.case 'extract_tables': { if (config.useBrowser) { const data = await dynamicScraper.scrapeDynamicContent(config); return data.tables; } else { return await staticScraper.extractTables(config); } }
- src/scrapers/static-scraper.ts:142-145 (handler)The extractTables method in StaticScraper class, which performs the table extraction by calling scrapeHTML and returning the tables.async extractTables(config: ScrapingConfig): Promise<TableData[]> { const data = await this.scrapeHTML(config); return data.tables || []; }
- src/scrapers/static-scraper.ts:60-98 (helper)Core helper logic inside scrapeHTML method for parsing HTML tables using Cheerio, extracting captions, headers, and rows into TableData format.const tables: TableData[] = []; $('table').each((_, tableElement) => { const table: TableData = { headers: [], rows: [], }; // Extract caption const caption = $(tableElement).find('caption').text().trim(); if (caption) { table.caption = caption; } // Extract headers $(tableElement) .find('thead th, thead td, tr:first-child th, tr:first-child td') .each((_, header) => { table.headers.push($(header).text().trim()); }); // Extract rows $(tableElement) .find('tbody tr, tr') .each((_, row) => { const rowData: string[] = []; $(row) .find('td, th') .each((_, cell) => { rowData.push($(cell).text().trim()); }); if (rowData.length > 0) { table.rows.push(rowData); } }); if (table.headers.length > 0 || table.rows.length > 0) { tables.push(table); } });