Skip to main content
Glama

render.extract_dom

Extract webpage DOM structures for security analysis and vulnerability testing by providing a URL and optional wait time.

Instructions

Extract and return the DOM structure of a webpage

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
urlYesURL to extract DOM from
waitTimeNoWait time in ms

Implementation Reference

  • Handler function that launches a Puppeteer browser, navigates to the given URL, waits, extracts the page HTML, title, forms with inputs, and top links, then returns a structured result.
    async ({ url, waitTime = 2000 }: any): Promise<ToolResult> => {
      let page: Page | null = null;
      try {
        const browserInstance = await getBrowser();
        page = await browserInstance.newPage();
        
        await page.goto(url, { waitUntil: 'networkidle2', timeout: 30000 });
        await new Promise(resolve => setTimeout(resolve, waitTime));
    
        const html = await page.content();
        const title = await page.title();
        const forms = await page.$$eval('form', (forms) =>
          forms.map((form) => ({
            action: form.action,
            method: form.method,
            inputs: Array.from(form.querySelectorAll('input')).map((input: any) => ({
              name: input.name,
              type: input.type,
              id: input.id,
            })),
          }))
        );
    
        const links = await page.$$eval('a', (links) =>
          links.map((link: any) => ({
            href: link.href,
            text: link.textContent?.trim(),
          }))
        );
    
        await page.close();
    
        return formatToolResult(true, {
          url,
          title,
          html: html.substring(0, 50000), // Limit size
          forms,
          links: links.slice(0, 100), // Limit links
          summary: {
            formsCount: forms.length,
            linksCount: links.length,
          },
        });
      } catch (error: any) {
        if (page) await page.close().catch(() => {});
        return formatToolResult(false, null, error.message);
      }
    }
  • Schema definition for the render.extract_dom tool, specifying input parameters url (required) and optional waitTime.
      description: 'Extract and return the DOM structure of a webpage',
      inputSchema: {
        type: 'object',
        properties: {
          url: { type: 'string', description: 'URL to extract DOM from' },
          waitTime: { type: 'number', description: 'Wait time in ms', default: 2000 },
        },
        required: ['url'],
      },
    },
  • Tool registration call using server.tool() with name 'render.extract_dom', its schema, and handler function within registerRenderTools.
      'render.extract_dom',
      {
        description: 'Extract and return the DOM structure of a webpage',
        inputSchema: {
          type: 'object',
          properties: {
            url: { type: 'string', description: 'URL to extract DOM from' },
            waitTime: { type: 'number', description: 'Wait time in ms', default: 2000 },
          },
          required: ['url'],
        },
      },
      async ({ url, waitTime = 2000 }: any): Promise<ToolResult> => {
        let page: Page | null = null;
        try {
          const browserInstance = await getBrowser();
          page = await browserInstance.newPage();
          
          await page.goto(url, { waitUntil: 'networkidle2', timeout: 30000 });
          await new Promise(resolve => setTimeout(resolve, waitTime));
    
          const html = await page.content();
          const title = await page.title();
          const forms = await page.$$eval('form', (forms) =>
            forms.map((form) => ({
              action: form.action,
              method: form.method,
              inputs: Array.from(form.querySelectorAll('input')).map((input: any) => ({
                name: input.name,
                type: input.type,
                id: input.id,
              })),
            }))
          );
    
          const links = await page.$$eval('a', (links) =>
            links.map((link: any) => ({
              href: link.href,
              text: link.textContent?.trim(),
            }))
          );
    
          await page.close();
    
          return formatToolResult(true, {
            url,
            title,
            html: html.substring(0, 50000), // Limit size
            forms,
            links: links.slice(0, 100), // Limit links
            summary: {
              formsCount: forms.length,
              linksCount: links.length,
            },
          });
        } catch (error: any) {
          if (page) await page.close().catch(() => {});
          return formatToolResult(false, null, error.message);
        }
      }
    );
  • Helper function to lazily initialize and return a shared Puppeteer browser instance used by all render tools.
    async function getBrowser(): Promise<Browser> {
      if (!browser) {
        browser = await puppeteer.launch({
          headless: true,
          args: ['--no-sandbox', '--disable-setuid-sandbox'],
        });
      }
      return browser;
    }
  • src/index.ts:42-42 (registration)
    Invocation of registerRenderTools which registers all render tools including render.extract_dom.
    registerRenderTools(server);

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/telmon95/VulneraMCP'

If you have feedback or need assistance with the MCP directory API, please join our Discord server