read_webpage
Extract text content from webpages to analyze information, summarize articles, or gather data for research.
Instructions
Fetch and extract text content from a webpage
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| url | Yes | URL of the webpage to read |
Implementation Reference
- src/index.ts:103-132 (handler)The main handler function for the 'read_webpage' tool. Fetches webpage content using axios, parses HTML with cheerio, extracts title and cleaned body text, returns JSON-formatted content or error.private async handleReadWebpage(url: string) { try { const response = await axios.get(url); const $ = cheerio.load(response.data); // Remove script and style elements $('script, style').remove(); const content: WebpageContent = { title: $('title').text().trim(), text: $('body').text().trim().replace(/\s+/g, ' '), url: url, }; return { content: [{ type: 'text', text: JSON.stringify(content, null, 2), }], }; } catch (error: unknown) { return { content: [{ type: 'text', text: `Webpage fetch error: ${error instanceof Error ? error.message : String(error)}`, }], isError: true, }; } }
- src/index.ts:156-169 (schema)Input schema and metadata definition for the 'read_webpage' tool, specifying the required 'url' parameter.const readToolSchema = { name: 'read_webpage', description: 'Fetch and extract text content from a webpage', inputSchema: { type: 'object', properties: { url: { type: 'string', description: 'URL of the webpage to read', }, }, required: ['url'], }, };
- src/index.ts:189-193 (registration)Dispatch logic in the CallToolRequest handler that routes 'read_webpage' calls to the handleReadWebpage function.// Handle read_webpage tool if (request.params.name === 'read_webpage') { const {url} = request.params.arguments as { url: string }; return await this.handleReadWebpage(url); }