read_webpage
Fetch and extract text content from any webpage URL for analysis or data processing.
Instructions
Fetch and extract text content from a webpage
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| url | Yes | URL of the webpage to read |
Implementation Reference
- src/index.ts:206-254 (handler)The handler logic for the 'read_webpage' tool. Validates arguments, fetches the URL via axios, parses HTML with cheerio, removes script/style elements, extracts title and body text, and returns the content as JSON.
} else if (request.params.name === 'read_webpage') { if (!isValidWebpageArgs(request.params.arguments)) { throw new McpError( ErrorCode.InvalidParams, 'Invalid webpage arguments' ); } const { url } = request.params.arguments; try { const proxyConfig = createProxyConfig(); const response = await axios.get(url, { proxy: proxyConfig, }); const $ = cheerio.load(response.data); // Remove script and style elements $('script, style').remove(); const content: WebpageContent = { title: $('title').text().trim(), text: $('body').text().trim().replace(/\s+/g, ' '), url: url, }; return { content: [ { type: 'text', text: JSON.stringify(content, null, 2), }, ], }; } catch (error) { if (axios.isAxiosError(error)) { return { content: [ { type: 'text', text: `Webpage fetch error: ${error.message}`, }, ], isError: true, }; } throw error; } } - src/index.ts:73-78 (schema)Input validation function (isValidWebpageArgs) that checks that arguments are an object with a string 'url' property.
const isValidWebpageArgs = ( args: any ): args is { url: string } => typeof args === 'object' && args !== null && typeof args.url === 'string'; - src/index.ts:140-153 (schema)Tool registration with name 'read_webpage', description, and input JSON schema defining the required 'url' string parameter.
{ name: 'read_webpage', description: 'Fetch and extract text content from a webpage', inputSchema: { type: 'object', properties: { url: { type: 'string', description: 'URL of the webpage to read', }, }, required: ['url'], }, }, - src/index.ts:140-155 (registration)Registration of the 'read_webpage' tool in the ListToolsRequestSchema handler, listing it as an available tool.
{ name: 'read_webpage', description: 'Fetch and extract text content from a webpage', inputSchema: { type: 'object', properties: { url: { type: 'string', description: 'URL of the webpage to read', }, }, required: ['url'], }, }, ], })); - src/index.ts:59-63 (helper)The WebpageContent interface used to type the returned webpage data (title, text, url).
interface WebpageContent { title: string; text: string; url: string; }