Skip to main content
Glama
adenot

MCP Google Server

by adenot

read_webpage

Extract text content from any webpage by providing its URL. This tool fetches and processes web content for analysis or data collection.

Instructions

Fetch and extract text content from a webpage

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
urlYesURL of the webpage to read

Implementation Reference

  • The main handler logic for the 'read_webpage' tool. Validates input arguments, fetches the webpage using axios (with optional proxy), parses HTML with cheerio to extract title and cleaned body text, structures the output as WebpageContent, and handles errors.
    } else if (request.params.name === 'read_webpage') {
      if (!isValidWebpageArgs(request.params.arguments)) {
        throw new McpError(
          ErrorCode.InvalidParams,
          'Invalid webpage arguments'
        );
      }
    
      const { url } = request.params.arguments;
    
      try {
        const proxyConfig = createProxyConfig();
        const response = await axios.get(url, {
          proxy: proxyConfig,
        });
        const $ = cheerio.load(response.data);
    
        // Remove script and style elements
        $('script, style').remove();
    
        const content: WebpageContent = {
          title: $('title').text().trim(),
          text: $('body').text().trim().replace(/\s+/g, ' '),
          url: url,
        };
    
        return {
          content: [
            {
              type: 'text',
              text: JSON.stringify(content, null, 2),
            },
          ],
        };
      } catch (error) {
        if (axios.isAxiosError(error)) {
          return {
            content: [
              {
                type: 'text',
                text: `Webpage fetch error: ${error.message}`,
              },
            ],
            isError: true,
          };
        }
        throw error;
      }
    }
  • src/index.ts:140-153 (registration)
    Registration of the 'read_webpage' tool in the MCP server's ListTools response, defining its name, description, and input schema.
    {
      name: 'read_webpage',
      description: 'Fetch and extract text content from a webpage',
      inputSchema: {
        type: 'object',
        properties: {
          url: {
            type: 'string',
            description: 'URL of the webpage to read',
          },
        },
        required: ['url'],
      },
    },
  • Input validation type guard (schema) for the 'read_webpage' tool arguments, ensuring the presence of a valid 'url' string.
    const isValidWebpageArgs = (
      args: any
    ): args is { url: string } =>
      typeof args === 'object' &&
      args !== null &&
      typeof args.url === 'string';
  • TypeScript interface defining the structure of the webpage content output from the 'read_webpage' tool.
    interface WebpageContent {
      title: string;
      text: string;
      url: string;
  • Helper function to create Axios proxy configuration from environment variables, used in the 'read_webpage' handler for HTTP requests.
    function createProxyConfig(): AxiosProxyConfig | false {
      const httpsProxy = process.env.HTTPS_PROXY || process.env.https_proxy;
      const httpProxy = process.env.HTTP_PROXY || process.env.http_proxy;
    
      const proxyUrl = httpsProxy || httpProxy;
    
      if (!proxyUrl) {
        return false;
      }
    
      try {
        const url = new URL(proxyUrl);
        return {
          protocol: url.protocol.replace(':', ''),
          host: url.hostname,
          port: parseInt(url.port) || (url.protocol === 'https:' ? 443 : 80),
          auth: url.username && url.password ? {
            username: url.username,
            password: url.password
          } : undefined
        };
      } catch (error) {
        console.warn(`Invalid proxy URL: ${proxyUrl}`);
        return false;
      }

Tool Definition Quality

Score is being calculated. Check back soon.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/adenot/mcp-google-search'

If you have feedback or need assistance with the MCP directory API, please join our Discord server