read_webpage
Extract the main text content from any webpage by providing its URL. Returns clean, readable text without HTML or formatting, suitable for further analysis.
Instructions
Fetch and extract text content from a webpage
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| url | Yes | URL of the webpage to read |
Implementation Reference
- src/index.ts:103-132 (handler)The handler function that executes the read_webpage tool logic. Fetches a webpage via HTTP, parses HTML with cheerio, removes script/style elements, and extracts the title and body text.
private async handleReadWebpage(url: string) { try { const response = await axios.get(url); const $ = cheerio.load(response.data); // Remove script and style elements $('script, style').remove(); const content: WebpageContent = { title: $('title').text().trim(), text: $('body').text().trim().replace(/\s+/g, ' '), url: url, }; return { content: [{ type: 'text', text: JSON.stringify(content, null, 2), }], }; } catch (error: unknown) { return { content: [{ type: 'text', text: `Webpage fetch error: ${error instanceof Error ? error.message : String(error)}`, }], isError: true, }; } } - src/index.ts:156-169 (schema)The input schema and tool definition for read_webpage. Defines the tool name as 'read_webpage', provides a description, and specifies the input schema requiring a 'url' string parameter.
const readToolSchema = { name: 'read_webpage', description: 'Fetch and extract text content from a webpage', inputSchema: { type: 'object', properties: { url: { type: 'string', description: 'URL of the webpage to read', }, }, required: ['url'], }, }; - src/index.ts:189-193 (registration)Where the read_webpage tool handler is invoked. In the CallToolRequestSchema handler, the code checks if the tool name is 'read_webpage', extracts the url argument, and calls handleReadWebpage.
// Handle read_webpage tool if (request.params.name === 'read_webpage') { const {url} = request.params.arguments as { url: string }; return await this.handleReadWebpage(url); }