Skip to main content
Glama
adenot

MCP Google Server

by adenot

read_webpage

Extract text content from any webpage by providing its URL. This tool fetches and processes web content for analysis or data collection.

Instructions

Fetch and extract text content from a webpage

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
urlYesURL of the webpage to read

Implementation Reference

  • The main handler logic for the 'read_webpage' tool. Validates input arguments, fetches the webpage using axios (with optional proxy), parses HTML with cheerio to extract title and cleaned body text, structures the output as WebpageContent, and handles errors.
    } else if (request.params.name === 'read_webpage') { if (!isValidWebpageArgs(request.params.arguments)) { throw new McpError( ErrorCode.InvalidParams, 'Invalid webpage arguments' ); } const { url } = request.params.arguments; try { const proxyConfig = createProxyConfig(); const response = await axios.get(url, { proxy: proxyConfig, }); const $ = cheerio.load(response.data); // Remove script and style elements $('script, style').remove(); const content: WebpageContent = { title: $('title').text().trim(), text: $('body').text().trim().replace(/\s+/g, ' '), url: url, }; return { content: [ { type: 'text', text: JSON.stringify(content, null, 2), }, ], }; } catch (error) { if (axios.isAxiosError(error)) { return { content: [ { type: 'text', text: `Webpage fetch error: ${error.message}`, }, ], isError: true, }; } throw error; } }
  • src/index.ts:140-153 (registration)
    Registration of the 'read_webpage' tool in the MCP server's ListTools response, defining its name, description, and input schema.
    { name: 'read_webpage', description: 'Fetch and extract text content from a webpage', inputSchema: { type: 'object', properties: { url: { type: 'string', description: 'URL of the webpage to read', }, }, required: ['url'], }, },
  • Input validation type guard (schema) for the 'read_webpage' tool arguments, ensuring the presence of a valid 'url' string.
    const isValidWebpageArgs = ( args: any ): args is { url: string } => typeof args === 'object' && args !== null && typeof args.url === 'string';
  • TypeScript interface defining the structure of the webpage content output from the 'read_webpage' tool.
    interface WebpageContent { title: string; text: string; url: string;
  • Helper function to create Axios proxy configuration from environment variables, used in the 'read_webpage' handler for HTTP requests.
    function createProxyConfig(): AxiosProxyConfig | false { const httpsProxy = process.env.HTTPS_PROXY || process.env.https_proxy; const httpProxy = process.env.HTTP_PROXY || process.env.http_proxy; const proxyUrl = httpsProxy || httpProxy; if (!proxyUrl) { return false; } try { const url = new URL(proxyUrl); return { protocol: url.protocol.replace(':', ''), host: url.hostname, port: parseInt(url.port) || (url.protocol === 'https:' ? 443 : 80), auth: url.username && url.password ? { username: url.username, password: url.password } : undefined }; } catch (error) { console.warn(`Invalid proxy URL: ${proxyUrl}`); return false; }

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/adenot/mcp-google-search'

If you have feedback or need assistance with the MCP directory API, please join our Discord server