Skip to main content
Glama

fetch_website_nested

Crawl and fetch website content with nested URL structures, converting it into clean, structured markdown. Specify depth, page limits, URL patterns, and domain restrictions for precise content extraction.

Instructions

Fetch website content with nested URL crawling and convert to clean markdown

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
excludePatternsNoRegex patterns for URLs to exclude
includePatternsNoRegex patterns for URLs to include (if specified, only matching URLs will be processed)
maxDepthNoMaximum depth to crawl (default: 2)
maxPagesNoMaximum number of pages to fetch (default: 50)
sameDomainOnlyNoOnly crawl URLs from the same domain (default: true)
timeoutNoRequest timeout in milliseconds (default: 10000)
urlYesThe starting URL to fetch and crawl

Implementation Reference

  • The tool execution handler for 'fetch_website_nested'. Extracts parameters from the request, validates URL, constructs FetchOptions, calls AdvancedWebScraper.scrapeWebsite, and returns markdown content.
    case "fetch_website_nested": { const { url, maxDepth = 2, maxPages = 50, sameDomainOnly = true, excludePatterns = [], includePatterns = [], timeout = 10000, } = args as any; if (!url) { throw new Error("URL is required"); } try { const options: FetchOptions = { maxDepth, maxPages, sameDomainOnly, excludePatterns, includePatterns, timeout, }; const markdown = await scraper.scrapeWebsite(url, options); return { content: [ { type: "text", text: markdown, }, ], }; } catch (error) { throw new Error(`Failed to fetch website: ${error}`); } }
  • Tool definition including name, description, and input schema for 'fetch_website_nested'.
    { name: "fetch_website_nested", description: "Fetch website content with nested URL crawling and convert to clean markdown", inputSchema: { type: "object", properties: { url: { type: "string", description: "The starting URL to fetch and crawl", }, maxDepth: { type: "number", description: "Maximum depth to crawl (default: 2)", default: 2, }, maxPages: { type: "number", description: "Maximum number of pages to fetch (default: 50)", default: 50, }, sameDomainOnly: { type: "boolean", description: "Only crawl URLs from the same domain (default: true)", default: true, }, excludePatterns: { type: "array", items: { type: "string" }, description: "Regex patterns for URLs to exclude", }, includePatterns: { type: "array", items: { type: "string" }, description: "Regex patterns for URLs to include (if specified, only matching URLs will be processed)", }, timeout: { type: "number", description: "Request timeout in milliseconds (default: 10000)", default: 10000, }, }, required: ["url"], },
  • src/server.ts:382-386 (registration)
    Registers the listTools handler that returns the TOOLS array containing 'fetch_website_nested'.
    server.setRequestHandler(ListToolsRequestSchema, async () => { return { tools: TOOLS, }; });
  • Core helper method in AdvancedWebScraper that implements the nested crawling logic: processes queue of URLs up to maxDepth and maxPages, fetches content, extracts links, and formats as markdown.
    async scrapeWebsite(startUrl: string, options: FetchOptions = {}): Promise<string> { const { maxDepth = 2, maxPages = 50, sameDomainOnly = true, timeout = 10000 } = options; this.baseUrl = startUrl; this.visitedUrls.clear(); const allContent: PageContent[] = []; const urlsToProcess: Array<{ url: string; depth: number }> = [{ url: startUrl, depth: 0 }]; while (urlsToProcess.length > 0 && allContent.length < maxPages) { const { url, depth } = urlsToProcess.shift()!; if (depth > maxDepth || this.visitedUrls.has(url)) { continue; } const pageContent = await this.fetchPageContent(url, depth, options); if (pageContent) { allContent.push(pageContent); // Add child URLs for processing if (depth < maxDepth) { for (const link of pageContent.links) { if (!this.visitedUrls.has(link)) { urlsToProcess.push({ url: link, depth: depth + 1 }); } } } } // Small delay to be respectful await new Promise(resolve => setTimeout(resolve, 500)); } return this.formatAsMarkdown(allContent, startUrl); }
  • Instantiates the AdvancedWebScraper class used by the tool handler.
    const scraper = new AdvancedWebScraper();

Other Tools

Related Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/flutterninja9/better-fetch'

If you have feedback or need assistance with the MCP directory API, please join our Discord server