pilot_page_text
Extract clean text from web pages by removing scripts, styles, and non-essential elements for content analysis and data processing.
Instructions
Extract clean text from the page (strips script/style/noscript/svg).
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
No arguments | |||
Implementation Reference
- src/tools/page.ts:21-34 (handler)The handler for 'pilot_page_text' tool, which extracts clean text from the page using a helper function.
server.tool( 'pilot_page_text', 'Extract clean text from the page (strips script/style/noscript/svg).', {}, async () => { await bm.ensureBrowser(); try { const text = await getCleanText(bm.getPage()); return { content: [{ type: 'text' as const, text }] }; } catch (err) { return { content: [{ type: 'text' as const, text: wrapError(err) }], isError: true }; } } ); - src/tools/page.ts:6-17 (helper)Helper function used by 'pilot_page_text' to clean and extract text from the DOM.
async function getCleanText(page: import('playwright').Page): Promise<string> { return await page.evaluate(() => { const body = document.body; if (!body) return ''; const clone = body.cloneNode(true) as HTMLElement; clone.querySelectorAll('script, style, noscript, svg').forEach(el => el.remove()); return clone.innerText .split('\n') .map(line => line.trim()) .filter(line => line.length > 0) .join('\n'); });