Skip to main content
Glama

pilot_page_text

Extract clean text from web pages by removing scripts, styles, and non-essential elements for content analysis and data processing.

Instructions

Extract clean text from the page (strips script/style/noscript/svg).

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault

No arguments

Implementation Reference

  • The handler for 'pilot_page_text' tool, which extracts clean text from the page using a helper function.
    server.tool(
      'pilot_page_text',
      'Extract clean text from the page (strips script/style/noscript/svg).',
      {},
      async () => {
        await bm.ensureBrowser();
        try {
          const text = await getCleanText(bm.getPage());
          return { content: [{ type: 'text' as const, text }] };
        } catch (err) {
          return { content: [{ type: 'text' as const, text: wrapError(err) }], isError: true };
        }
      }
    );
  • Helper function used by 'pilot_page_text' to clean and extract text from the DOM.
    async function getCleanText(page: import('playwright').Page): Promise<string> {
      return await page.evaluate(() => {
        const body = document.body;
        if (!body) return '';
        const clone = body.cloneNode(true) as HTMLElement;
        clone.querySelectorAll('script, style, noscript, svg').forEach(el => el.remove());
        return clone.innerText
          .split('\n')
          .map(line => line.trim())
          .filter(line => line.length > 0)
          .join('\n');
      });

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/TacosyHorchata/Pilot'

If you have feedback or need assistance with the MCP directory API, please join our Discord server