generate-site-map
Crawl a website from a given URL to a specified depth and generate an XML sitemap with discovered URLs, up to a defined limit, for improved site navigation and indexing.
Instructions
Crawls a website starting from a given URL up to a specified depth and generates an XML sitemap containing the discovered URLs (up to a specified limit).
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| limit | No | Maximum number of URLs to include in the generated sitemap XML. Defaults to 1000. Max allowed is 5000. | |
| maxDepth | No | The maximum depth to crawl relative to the starting URL to discover pages for the sitemap. 0 means only the starting URL. Max allowed depth is 5. Defaults to 2. | |
| url | Yes | The starting URL for the crawl to generate the sitemap. Must be a valid HTTP or HTTPS URL. |
Implementation Reference
- src/tools/generateSitemapTool.ts:27-69 (handler)The main handler function that processes the 'generate-site-map' tool request. It extracts arguments, calls the GenerateSitemapService, formats the XML response for MCP, and handles errors appropriately.const processRequest = async (args: GenerateSitemapToolArgs) => { // Zod handles defaults for maxDepth and limit const { url, maxDepth, limit } = args; logger.debug(`Received ${TOOL_NAME} request`, { url, maxDepth, limit }); try { // Call the service method const result = await serviceInstance.generateSitemap(url, maxDepth, limit); // Format the successful output for MCP - return XML content return { content: [{ type: "text" as const, // Could also use 'application/xml' if client supports it text: result.sitemapXml }] // Optionally include urlCount in metadata if needed/supported }; } catch (error) { const logContext = { args, errorDetails: error instanceof Error ? { name: error.name, message: error.message, stack: error.stack } : String(error) }; logger.error(`Error processing ${TOOL_NAME}`, logContext); // Map service-specific errors to McpError if (error instanceof ValidationError) { throw new McpError(ErrorCode.InvalidParams, `Validation failed: ${error.message}`, error.details); } if (error instanceof ServiceError) { throw new McpError(ErrorCode.InternalError, error.message, error.details); } if (error instanceof McpError) { throw error; // Re-throw existing McpErrors } // Catch-all for unexpected errors throw new McpError( ErrorCode.InternalError, error instanceof Error ? `An unexpected error occurred in ${TOOL_NAME}: ${error.message}` : `An unexpected error occurred in ${TOOL_NAME}.` ); } };
- Zod schema defining the input parameters for the 'generate-site-map' tool, including validation, defaults, and descriptions.export const TOOL_PARAMS = { url: z.string().url().describe("The starting URL for the crawl to generate the sitemap. Must be a valid HTTP or HTTPS URL."), maxDepth: z.number().int().min(0).max(5).optional().default(2).describe("The maximum depth to crawl relative to the starting URL to discover pages for the sitemap. 0 means only the starting URL. Max allowed depth is 5. Defaults to 2."), limit: z.number().int().min(1).max(5000).optional().default(1000).describe("Maximum number of URLs to include in the generated sitemap XML. Defaults to 1000. Max allowed is 5000."), };
- src/tools/generateSitemapTool.ts:72-77 (registration)Registers the 'generate-site-map' tool with the MCP server using server.tool, providing name, description, params schema, and handler.server.tool( TOOL_NAME, TOOL_DESCRIPTION, TOOL_PARAMS, processRequest );
- The core helper method in GenerateSitemapService that performs the actual sitemap generation: validates inputs, crawls the site, limits URLs, generates XML, and returns SitemapResult.public async generateSitemap(startUrl: string, maxDepth: number, limit: number): Promise<SitemapResult> { // Basic validation if (!startUrl || typeof startUrl !== 'string') { throw new ValidationError('Invalid input: startUrl string is required.'); } if (typeof maxDepth !== 'number' || maxDepth < 0) { throw new ValidationError('Invalid input: maxDepth must be a non-negative number.'); } if (typeof limit !== 'number' || limit <= 0) { throw new ValidationError('Invalid input: limit must be a positive number.'); } logger.info(`Starting sitemap generation for: ${startUrl}`, { maxDepth, limit }); try { const visited = new Set<string>(); // Crawl the site to get URLs const allUrls = await crawlPage(startUrl, 0, maxDepth, visited); const uniqueUrls = Array.from(new Set(allUrls)); // Ensure uniqueness again logger.debug(`Crawl discovered ${uniqueUrls.length} unique URLs.`); // Apply the limit const limitedUrls = uniqueUrls.slice(0, limit); logger.debug(`Limiting sitemap to ${limitedUrls.length} URLs.`); // Generate XML sitemap string // Ensure URLs are properly escaped for XML const urlEntries = limitedUrls .map(url => ` <url> <loc>${escape(url)}</loc> <lastmod>${new Date().toISOString().split('T')[0]}</lastmod> </url>`) // Use YYYY-MM-DD format for lastmod .join('\n'); const sitemapXml = `<?xml version="1.0" encoding="UTF-8"?> <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"> ${urlEntries} </urlset>`; const result: SitemapResult = { sitemapXml: sitemapXml, urlCount: limitedUrls.length, }; logger.info(`Finished sitemap generation for ${startUrl}. Included ${result.urlCount} URLs.`); return result; } catch (error) { logger.error(`Error during sitemap generation for ${startUrl}`, { error: error instanceof Error ? error.message : String(error), startUrl, maxDepth, limit }); // Wrap errors from crawlPage or XML generation throw new ServiceError(`Sitemap generation failed for ${startUrl}: ${error instanceof Error ? error.message : String(error)}`, error); } }
- src/tools/index.ts:33-33 (registration)Invocation of the generateSitemapTool registration function from the central registerTools function.generateSitemapTool(server);