Skip to main content
Glama
wolfyy970

Docs Fetch MCP Server

by wolfyy970

fetch_doc_content

Retrieve web page content and explore linked pages up to a defined depth to extract comprehensive documentation insights for LLMs.

Instructions

Fetch web page content with the ability to explore linked pages up to a specified depth

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
depthNoMaximum depth of directory/link exploration (default: 1)
urlYesURL of the web page to fetch

Implementation Reference

  • Dispatch and execution wrapper for the fetch_doc_content tool: extracts parameters, implements timeout handling, calls the core fetchContent method, formats response as MCP content block, and handles errors.
    if (request.params.name === 'fetch_doc_content') {
      const { url, depth = 1 } = request.params.arguments as { 
        url: string;
        depth?: number;
      };
      
      // Reset visited URLs for each new request
      this.visitedUrls.clear();
      
      // Setup global timeout
      let timeoutError: Error | null = null;
      const timeoutPromise = new Promise<never>((_, reject) => {
        this.globalTimeout = setTimeout(() => {
          timeoutError = new Error('Operation timed out');
          reject(timeoutError);
        }, this.timeoutDuration);
      });
      
      try {
        // Race the content fetch against our timeout
        const result = await Promise.race([
          this.fetchContent(url, depth),
          timeoutPromise
        ]);
        
        // Clear timeout if we didn't hit it
        if (this.globalTimeout) {
          clearTimeout(this.globalTimeout);
          this.globalTimeout = null;
        }
        
        return {
          content: [
            {
              type: 'text',
              text: JSON.stringify(result, null, 2),
            },
          ],
        };
      } catch (error: unknown) {
        // Clear timeout if we hit an error
        if (this.globalTimeout) {
          clearTimeout(this.globalTimeout);
          this.globalTimeout = null;
        }
        
        let errorMessage = "Unknown error occurred";
        if (error === timeoutError) {
          errorMessage = "Operation timed out before completion";
        } else if (error instanceof Error) {
          errorMessage = error.message;
        } else if (typeof error === 'string') {
          errorMessage = error;
        }
        
        return {
          content: [
            {
              type: 'text',
              text: `Error fetching content: ${errorMessage}`,
            },
          ],
          isError: true,
        };
      }
    }
  • Core implementation of the tool logic: fetches page content preferring axios for speed, falls back to Puppeteer for JS-heavy pages, extracts clean text and same-domain links, supports recursive exploration up to maxDepth, tracks visited URLs to prevent cycles.
    private async fetchContent(url: string, maxDepth: number): Promise<any> {
      // Track visited URLs to avoid cycles
      this.visitedUrls.add(url);
      
      try {
        // First try a simple fetch with axios as a faster alternative
        try {
          const response = await axios.get(url, {
            timeout: 10000,
            headers: {
              'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36'
            }
          });
          
          // If we got HTML content, use it
          if (response.status === 200 && response.data && typeof response.data === 'string') {
            const mainContent = this.extractTextContent(response.data);
            const links = this.extractLinks(response.data, url);
            
            // For depth=1, just return the main content
            if (maxDepth <= 1) {
              return {
                rootUrl: url,
                explorationDepth: maxDepth,
                pagesExplored: 1,
                content: [{
                  url,
                  content: mainContent,
                  links: links.slice(0, 10) // Limit to top 10 links
                }]
              };
            }
            
            // For depth > 1, explore child links
            const childResults = [];
            const limit = Math.min(5, links.length); // Limit to 5 links for performance
            
            for (let i = 0; i < limit; i++) {
              const link = links[i];
              if (!this.visitedUrls.has(link.url)) {
                try {
                  const childContent = await this.fetchSimpleContent(link.url);
                  if (childContent) {
                    childResults.push({
                      url: link.url,
                      content: childContent,
                      links: [] // Don't include links for child pages
                    });
                    this.visitedUrls.add(link.url);
                  }
                } catch (e) {
                  // Ignore errors on child pages
                }
              }
            }
            
            return {
              rootUrl: url,
              explorationDepth: maxDepth,
              pagesExplored: 1 + childResults.length,
              content: [{
                url,
                content: mainContent,
                links: links.slice(0, 10)
              }, ...childResults]
            };
          }
        } catch (e) {
          // Fall back to puppeteer if axios fails
          console.error('Axios fetch failed, falling back to puppeteer:', e);
        }
        
        // Fall back to puppeteer for more complex pages
        return await this.fetchWithPuppeteer(url, maxDepth);
      } catch (error) {
        console.error('Error in content fetch:', error);
        // Return partial results if we have visited any URLs
        if (this.visitedUrls.size > 0) {
          return {
            rootUrl: url,
            explorationDepth: maxDepth,
            pagesExplored: this.visitedUrls.size,
            content: [],
            error: error instanceof Error ? error.message : String(error)
          };
        }
        throw error;
      }
    }
  • Input schema definition for the fetch_doc_content tool, registered in ListToolsRequestSchema handler. Defines required 'url' string and optional 'depth' number (1-5).
    {
      name: 'fetch_doc_content',
      description: 'Fetch web page content with the ability to explore linked pages up to a specified depth',
      inputSchema: {
        type: 'object',
        properties: {
          url: {
            type: 'string',
            description: 'URL of the web page to fetch',
          },
          depth: {
            type: 'number',
            description: 'Maximum depth of directory/link exploration (default: 1)',
            minimum: 1,
            maximum: 5,
          },
        },
        required: ['url'],
      },
    },
  • Helper function to extract plain text content from HTML by stripping tags, scripts, styles, normalizing whitespace, and truncating long content.
    private extractTextContent(html: string): string {
      // Very basic HTML to text conversion
      let text = html
        .replace(/<head>[\s\S]*?<\/head>/i, '')
        .replace(/<script[\s\S]*?<\/script>/gi, '')
        .replace(/<style[\s\S]*?<\/style>/gi, '')
        .replace(/<[^>]*>/g, ' ')
        .replace(/\s+/g, ' ')
        .trim();
      
      // Truncate if too long
      if (text.length > 10000) {
        text = text.substring(0, 10000) + '... (content truncated)';
      }
      
      return text;
    }
  • Helper function to parse HTML for anchor tags, extract and resolve relative hrefs to absolute same-domain URLs, filter out invalid/fragment/mailto/etc links, associate with link text.
    private extractLinks(html: string, baseUrl: string): Array<{url: string, text: string}> {
      const links: Array<{url: string, text: string}> = [];
      const linkRegex = /<a\s+(?:[^>]*?\s+)?href="([^"]*)"(?:\s+[^>]*?)?>([^<]*)<\/a>/gi;
      
      let match;
      while ((match = linkRegex.exec(html)) !== null) {
        const href = match[1].trim();
        const text = match[2].trim();
        
        // Skip empty links or special protocols
        if (!href || href.startsWith('#') || href.startsWith('javascript:') || 
            href.startsWith('mailto:') || href.startsWith('tel:')) {
          continue;
        }
        
        try {
          // Resolve relative URLs
          const fullUrl = new URL(href, baseUrl).href;
          
          // Only include links from same domain
          if (new URL(fullUrl).hostname === new URL(baseUrl).hostname) {
            links.push({
              url: fullUrl,
              text: text || fullUrl
            });
          }
        } catch (e) {
          // Skip invalid URLs
        }
      }
      
      return links;
    }
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It describes the core functionality (fetching and exploring links) but lacks details on permissions, rate limits, error handling, or what the response looks like (e.g., format, size limits). This leaves significant gaps for a tool that interacts with external web resources.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence that clearly states the tool's purpose and key feature (exploring linked pages). It is front-loaded with the main action and includes no redundant information, making it highly concise and well-structured.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity of fetching web content with link exploration, no annotations, and no output schema, the description is incomplete. It doesn't cover behavioral aspects like authentication needs, rate limits, or response format, which are critical for effective tool use in this context.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The schema description coverage is 100%, so the schema already documents both parameters ('url' and 'depth') with descriptions and constraints. The description adds minimal value by implying the depth parameter relates to link exploration, but it doesn't provide additional syntax or format details beyond what the schema specifies.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose with a specific verb ('fetch') and resource ('web page content'), and it adds valuable context about exploring linked pages. However, since there are no sibling tools mentioned, it doesn't need to differentiate from alternatives, which keeps it from reaching a perfect score.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives, prerequisites, or exclusions. It mentions the ability to explore linked pages up to a specified depth, but this is more about functionality than usage context.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Related Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/wolfyy970/docs-fetch-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server