Skip to main content
Glama
wolfyy970

Docs Fetch MCP Server

by wolfyy970

fetch_doc_content

Retrieve web page content and explore linked pages up to a defined depth to extract comprehensive documentation insights for LLMs.

Instructions

Fetch web page content with the ability to explore linked pages up to a specified depth

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
depthNoMaximum depth of directory/link exploration (default: 1)
urlYesURL of the web page to fetch

Implementation Reference

  • Dispatch and execution wrapper for the fetch_doc_content tool: extracts parameters, implements timeout handling, calls the core fetchContent method, formats response as MCP content block, and handles errors.
    if (request.params.name === 'fetch_doc_content') {
      const { url, depth = 1 } = request.params.arguments as { 
        url: string;
        depth?: number;
      };
      
      // Reset visited URLs for each new request
      this.visitedUrls.clear();
      
      // Setup global timeout
      let timeoutError: Error | null = null;
      const timeoutPromise = new Promise<never>((_, reject) => {
        this.globalTimeout = setTimeout(() => {
          timeoutError = new Error('Operation timed out');
          reject(timeoutError);
        }, this.timeoutDuration);
      });
      
      try {
        // Race the content fetch against our timeout
        const result = await Promise.race([
          this.fetchContent(url, depth),
          timeoutPromise
        ]);
        
        // Clear timeout if we didn't hit it
        if (this.globalTimeout) {
          clearTimeout(this.globalTimeout);
          this.globalTimeout = null;
        }
        
        return {
          content: [
            {
              type: 'text',
              text: JSON.stringify(result, null, 2),
            },
          ],
        };
      } catch (error: unknown) {
        // Clear timeout if we hit an error
        if (this.globalTimeout) {
          clearTimeout(this.globalTimeout);
          this.globalTimeout = null;
        }
        
        let errorMessage = "Unknown error occurred";
        if (error === timeoutError) {
          errorMessage = "Operation timed out before completion";
        } else if (error instanceof Error) {
          errorMessage = error.message;
        } else if (typeof error === 'string') {
          errorMessage = error;
        }
        
        return {
          content: [
            {
              type: 'text',
              text: `Error fetching content: ${errorMessage}`,
            },
          ],
          isError: true,
        };
      }
    }
  • Core implementation of the tool logic: fetches page content preferring axios for speed, falls back to Puppeteer for JS-heavy pages, extracts clean text and same-domain links, supports recursive exploration up to maxDepth, tracks visited URLs to prevent cycles.
    private async fetchContent(url: string, maxDepth: number): Promise<any> {
      // Track visited URLs to avoid cycles
      this.visitedUrls.add(url);
      
      try {
        // First try a simple fetch with axios as a faster alternative
        try {
          const response = await axios.get(url, {
            timeout: 10000,
            headers: {
              'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36'
            }
          });
          
          // If we got HTML content, use it
          if (response.status === 200 && response.data && typeof response.data === 'string') {
            const mainContent = this.extractTextContent(response.data);
            const links = this.extractLinks(response.data, url);
            
            // For depth=1, just return the main content
            if (maxDepth <= 1) {
              return {
                rootUrl: url,
                explorationDepth: maxDepth,
                pagesExplored: 1,
                content: [{
                  url,
                  content: mainContent,
                  links: links.slice(0, 10) // Limit to top 10 links
                }]
              };
            }
            
            // For depth > 1, explore child links
            const childResults = [];
            const limit = Math.min(5, links.length); // Limit to 5 links for performance
            
            for (let i = 0; i < limit; i++) {
              const link = links[i];
              if (!this.visitedUrls.has(link.url)) {
                try {
                  const childContent = await this.fetchSimpleContent(link.url);
                  if (childContent) {
                    childResults.push({
                      url: link.url,
                      content: childContent,
                      links: [] // Don't include links for child pages
                    });
                    this.visitedUrls.add(link.url);
                  }
                } catch (e) {
                  // Ignore errors on child pages
                }
              }
            }
            
            return {
              rootUrl: url,
              explorationDepth: maxDepth,
              pagesExplored: 1 + childResults.length,
              content: [{
                url,
                content: mainContent,
                links: links.slice(0, 10)
              }, ...childResults]
            };
          }
        } catch (e) {
          // Fall back to puppeteer if axios fails
          console.error('Axios fetch failed, falling back to puppeteer:', e);
        }
        
        // Fall back to puppeteer for more complex pages
        return await this.fetchWithPuppeteer(url, maxDepth);
      } catch (error) {
        console.error('Error in content fetch:', error);
        // Return partial results if we have visited any URLs
        if (this.visitedUrls.size > 0) {
          return {
            rootUrl: url,
            explorationDepth: maxDepth,
            pagesExplored: this.visitedUrls.size,
            content: [],
            error: error instanceof Error ? error.message : String(error)
          };
        }
        throw error;
      }
    }
  • Input schema definition for the fetch_doc_content tool, registered in ListToolsRequestSchema handler. Defines required 'url' string and optional 'depth' number (1-5).
    {
      name: 'fetch_doc_content',
      description: 'Fetch web page content with the ability to explore linked pages up to a specified depth',
      inputSchema: {
        type: 'object',
        properties: {
          url: {
            type: 'string',
            description: 'URL of the web page to fetch',
          },
          depth: {
            type: 'number',
            description: 'Maximum depth of directory/link exploration (default: 1)',
            minimum: 1,
            maximum: 5,
          },
        },
        required: ['url'],
      },
    },
  • Helper function to extract plain text content from HTML by stripping tags, scripts, styles, normalizing whitespace, and truncating long content.
    private extractTextContent(html: string): string {
      // Very basic HTML to text conversion
      let text = html
        .replace(/<head>[\s\S]*?<\/head>/i, '')
        .replace(/<script[\s\S]*?<\/script>/gi, '')
        .replace(/<style[\s\S]*?<\/style>/gi, '')
        .replace(/<[^>]*>/g, ' ')
        .replace(/\s+/g, ' ')
        .trim();
      
      // Truncate if too long
      if (text.length > 10000) {
        text = text.substring(0, 10000) + '... (content truncated)';
      }
      
      return text;
    }
  • Helper function to parse HTML for anchor tags, extract and resolve relative hrefs to absolute same-domain URLs, filter out invalid/fragment/mailto/etc links, associate with link text.
    private extractLinks(html: string, baseUrl: string): Array<{url: string, text: string}> {
      const links: Array<{url: string, text: string}> = [];
      const linkRegex = /<a\s+(?:[^>]*?\s+)?href="([^"]*)"(?:\s+[^>]*?)?>([^<]*)<\/a>/gi;
      
      let match;
      while ((match = linkRegex.exec(html)) !== null) {
        const href = match[1].trim();
        const text = match[2].trim();
        
        // Skip empty links or special protocols
        if (!href || href.startsWith('#') || href.startsWith('javascript:') || 
            href.startsWith('mailto:') || href.startsWith('tel:')) {
          continue;
        }
        
        try {
          // Resolve relative URLs
          const fullUrl = new URL(href, baseUrl).href;
          
          // Only include links from same domain
          if (new URL(fullUrl).hostname === new URL(baseUrl).hostname) {
            links.push({
              url: fullUrl,
              text: text || fullUrl
            });
          }
        } catch (e) {
          // Skip invalid URLs
        }
      }
      
      return links;
    }
Install Server

Other Tools

Related Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/wolfyy970/docs-fetch-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server