Skip to main content
Glama

extract_tables

Extract structured table data from web pages for analysis, supporting both static content and dynamic JavaScript-rendered pages.

Instructions

Extract table data from a web page

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
urlYesURL to scrape
useBrowserNoUse browser for dynamic content

Implementation Reference

  • Registration of the 'extract_tables' tool including name, description, and input schema.
    { name: 'extract_tables', description: 'Extract table data from a web page', inputSchema: { type: 'object', properties: { url: { type: 'string', description: 'URL to scrape', }, useBrowser: { type: 'boolean', description: 'Use browser for dynamic content', default: false, }, }, required: ['url'], }, },
  • Dispatcher logic in handleWebScrapingTool for the 'extract_tables' tool, routing to static or dynamic scraper based on useBrowser flag.
    case 'extract_tables': { if (config.useBrowser) { const data = await dynamicScraper.scrapeDynamicContent(config); return data.tables; } else { return await staticScraper.extractTables(config); } }
  • The extractTables method in StaticScraper class, which performs the table extraction by calling scrapeHTML and returning the tables.
    async extractTables(config: ScrapingConfig): Promise<TableData[]> { const data = await this.scrapeHTML(config); return data.tables || []; }
  • Core helper logic inside scrapeHTML method for parsing HTML tables using Cheerio, extracting captions, headers, and rows into TableData format.
    const tables: TableData[] = []; $('table').each((_, tableElement) => { const table: TableData = { headers: [], rows: [], }; // Extract caption const caption = $(tableElement).find('caption').text().trim(); if (caption) { table.caption = caption; } // Extract headers $(tableElement) .find('thead th, thead td, tr:first-child th, tr:first-child td') .each((_, header) => { table.headers.push($(header).text().trim()); }); // Extract rows $(tableElement) .find('tbody tr, tr') .each((_, row) => { const rowData: string[] = []; $(row) .find('td, th') .each((_, cell) => { rowData.push($(cell).text().trim()); }); if (rowData.length > 0) { table.rows.push(rowData); } }); if (table.headers.length > 0 || table.rows.length > 0) { tables.push(table); } });

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/code-alchemist01/development-tools-mcp-Server'

If you have feedback or need assistance with the MCP directory API, please join our Discord server