Skip to main content
Glama
bsmi021
by bsmi021

generate-site-map

Crawl a website from a given URL to a specified depth and generate an XML sitemap with discovered URLs, up to a defined limit, for improved site navigation and indexing.

Instructions

Crawls a website starting from a given URL up to a specified depth and generates an XML sitemap containing the discovered URLs (up to a specified limit).

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
limitNoMaximum number of URLs to include in the generated sitemap XML. Defaults to 1000. Max allowed is 5000.
maxDepthNoThe maximum depth to crawl relative to the starting URL to discover pages for the sitemap. 0 means only the starting URL. Max allowed depth is 5. Defaults to 2.
urlYesThe starting URL for the crawl to generate the sitemap. Must be a valid HTTP or HTTPS URL.

Implementation Reference

  • The main handler function that processes the 'generate-site-map' tool request. It extracts arguments, calls the GenerateSitemapService, formats the XML response for MCP, and handles errors appropriately.
    const processRequest = async (args: GenerateSitemapToolArgs) => { // Zod handles defaults for maxDepth and limit const { url, maxDepth, limit } = args; logger.debug(`Received ${TOOL_NAME} request`, { url, maxDepth, limit }); try { // Call the service method const result = await serviceInstance.generateSitemap(url, maxDepth, limit); // Format the successful output for MCP - return XML content return { content: [{ type: "text" as const, // Could also use 'application/xml' if client supports it text: result.sitemapXml }] // Optionally include urlCount in metadata if needed/supported }; } catch (error) { const logContext = { args, errorDetails: error instanceof Error ? { name: error.name, message: error.message, stack: error.stack } : String(error) }; logger.error(`Error processing ${TOOL_NAME}`, logContext); // Map service-specific errors to McpError if (error instanceof ValidationError) { throw new McpError(ErrorCode.InvalidParams, `Validation failed: ${error.message}`, error.details); } if (error instanceof ServiceError) { throw new McpError(ErrorCode.InternalError, error.message, error.details); } if (error instanceof McpError) { throw error; // Re-throw existing McpErrors } // Catch-all for unexpected errors throw new McpError( ErrorCode.InternalError, error instanceof Error ? `An unexpected error occurred in ${TOOL_NAME}: ${error.message}` : `An unexpected error occurred in ${TOOL_NAME}.` ); } };
  • Zod schema defining the input parameters for the 'generate-site-map' tool, including validation, defaults, and descriptions.
    export const TOOL_PARAMS = { url: z.string().url().describe("The starting URL for the crawl to generate the sitemap. Must be a valid HTTP or HTTPS URL."), maxDepth: z.number().int().min(0).max(5).optional().default(2).describe("The maximum depth to crawl relative to the starting URL to discover pages for the sitemap. 0 means only the starting URL. Max allowed depth is 5. Defaults to 2."), limit: z.number().int().min(1).max(5000).optional().default(1000).describe("Maximum number of URLs to include in the generated sitemap XML. Defaults to 1000. Max allowed is 5000."), };
  • Registers the 'generate-site-map' tool with the MCP server using server.tool, providing name, description, params schema, and handler.
    server.tool( TOOL_NAME, TOOL_DESCRIPTION, TOOL_PARAMS, processRequest );
  • The core helper method in GenerateSitemapService that performs the actual sitemap generation: validates inputs, crawls the site, limits URLs, generates XML, and returns SitemapResult.
    public async generateSitemap(startUrl: string, maxDepth: number, limit: number): Promise<SitemapResult> { // Basic validation if (!startUrl || typeof startUrl !== 'string') { throw new ValidationError('Invalid input: startUrl string is required.'); } if (typeof maxDepth !== 'number' || maxDepth < 0) { throw new ValidationError('Invalid input: maxDepth must be a non-negative number.'); } if (typeof limit !== 'number' || limit <= 0) { throw new ValidationError('Invalid input: limit must be a positive number.'); } logger.info(`Starting sitemap generation for: ${startUrl}`, { maxDepth, limit }); try { const visited = new Set<string>(); // Crawl the site to get URLs const allUrls = await crawlPage(startUrl, 0, maxDepth, visited); const uniqueUrls = Array.from(new Set(allUrls)); // Ensure uniqueness again logger.debug(`Crawl discovered ${uniqueUrls.length} unique URLs.`); // Apply the limit const limitedUrls = uniqueUrls.slice(0, limit); logger.debug(`Limiting sitemap to ${limitedUrls.length} URLs.`); // Generate XML sitemap string // Ensure URLs are properly escaped for XML const urlEntries = limitedUrls .map(url => ` <url> <loc>${escape(url)}</loc> <lastmod>${new Date().toISOString().split('T')[0]}</lastmod> </url>`) // Use YYYY-MM-DD format for lastmod .join('\n'); const sitemapXml = `<?xml version="1.0" encoding="UTF-8"?> <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"> ${urlEntries} </urlset>`; const result: SitemapResult = { sitemapXml: sitemapXml, urlCount: limitedUrls.length, }; logger.info(`Finished sitemap generation for ${startUrl}. Included ${result.urlCount} URLs.`); return result; } catch (error) { logger.error(`Error during sitemap generation for ${startUrl}`, { error: error instanceof Error ? error.message : String(error), startUrl, maxDepth, limit }); // Wrap errors from crawlPage or XML generation throw new ServiceError(`Sitemap generation failed for ${startUrl}: ${error instanceof Error ? error.message : String(error)}`, error); } }
  • Invocation of the generateSitemapTool registration function from the central registerTools function.
    generateSitemapTool(server);

Other Tools

Related Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/bsmi021/mcp-server-webscan'

If you have feedback or need assistance with the MCP directory API, please join our Discord server