extract_text
Extract text content from web pages for development workflows, supporting both static and dynamic content extraction.
Instructions
Extract text content from a web page
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| url | Yes | URL to scrape | |
| useBrowser | No | Use browser for dynamic content |
Implementation Reference
- src/scrapers/static-scraper.ts:118-121 (handler)The extractText method in StaticScraper class, which executes the core logic for the 'extract_text' tool by scraping the HTML and returning the plain text content.async extractText(config: ScrapingConfig): Promise<string> { const data = await this.scrapeHTML(config); return data.text || ''; }
- src/tools/web-scraping.ts:293-300 (handler)Dispatch handler in handleWebScrapingTool function that handles the 'extract_text' tool invocation, choosing between static and dynamic scrapers based on config.case 'extract_text': { if (config.useBrowser) { const data = await dynamicScraper.scrapeDynamicContent(config); return data.text; } else { return await staticScraper.extractText(config); } }
- src/tools/web-scraping.ts:36-54 (schema)Tool registration entry defining the name, description, and input schema for the 'extract_text' tool.{ name: 'extract_text', description: 'Extract text content from a web page', inputSchema: { type: 'object', properties: { url: { type: 'string', description: 'URL to scrape', }, useBrowser: { type: 'boolean', description: 'Use browser for dynamic content', default: false, }, }, required: ['url'], }, },
- src/tools/web-scraping.ts:36-54 (registration)Registration of the 'extract_text' tool within the webScrapingTools export array for MCP integration.{ name: 'extract_text', description: 'Extract text content from a web page', inputSchema: { type: 'object', properties: { url: { type: 'string', description: 'URL to scrape', }, useBrowser: { type: 'boolean', description: 'Use browser for dynamic content', default: false, }, }, required: ['url'], }, },