Skip to main content
Glama
bsmi021

MCP Webscan Server

by bsmi021

generate-site-map

Crawl a website from a given URL to a specified depth and generate an XML sitemap with discovered URLs, up to a defined limit, for improved site navigation and indexing.

Instructions

Crawls a website starting from a given URL up to a specified depth and generates an XML sitemap containing the discovered URLs (up to a specified limit).

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
limitNoMaximum number of URLs to include in the generated sitemap XML. Defaults to 1000. Max allowed is 5000.
maxDepthNoThe maximum depth to crawl relative to the starting URL to discover pages for the sitemap. 0 means only the starting URL. Max allowed depth is 5. Defaults to 2.
urlYesThe starting URL for the crawl to generate the sitemap. Must be a valid HTTP or HTTPS URL.

Implementation Reference

  • The main handler function that processes the 'generate-site-map' tool request. It extracts arguments, calls the GenerateSitemapService, formats the XML response for MCP, and handles errors appropriately.
    const processRequest = async (args: GenerateSitemapToolArgs) => {
        // Zod handles defaults for maxDepth and limit
        const { url, maxDepth, limit } = args;
        logger.debug(`Received ${TOOL_NAME} request`, { url, maxDepth, limit });
    
        try {
            // Call the service method
            const result = await serviceInstance.generateSitemap(url, maxDepth, limit);
    
            // Format the successful output for MCP - return XML content
            return {
                content: [{
                    type: "text" as const, // Could also use 'application/xml' if client supports it
                    text: result.sitemapXml
                }]
                // Optionally include urlCount in metadata if needed/supported
            };
    
        } catch (error) {
            const logContext = {
                args,
                errorDetails: error instanceof Error ? { name: error.name, message: error.message, stack: error.stack } : String(error)
            };
            logger.error(`Error processing ${TOOL_NAME}`, logContext);
    
            // Map service-specific errors to McpError
            if (error instanceof ValidationError) {
                throw new McpError(ErrorCode.InvalidParams, `Validation failed: ${error.message}`, error.details);
            }
            if (error instanceof ServiceError) {
                throw new McpError(ErrorCode.InternalError, error.message, error.details);
            }
            if (error instanceof McpError) {
                throw error; // Re-throw existing McpErrors
            }
    
            // Catch-all for unexpected errors
            throw new McpError(
                ErrorCode.InternalError,
                error instanceof Error ? `An unexpected error occurred in ${TOOL_NAME}: ${error.message}` : `An unexpected error occurred in ${TOOL_NAME}.`
            );
        }
    };
  • Zod schema defining the input parameters for the 'generate-site-map' tool, including validation, defaults, and descriptions.
    export const TOOL_PARAMS = {
        url: z.string().url().describe("The starting URL for the crawl to generate the sitemap. Must be a valid HTTP or HTTPS URL."),
        maxDepth: z.number().int().min(0).max(5).optional().default(2).describe("The maximum depth to crawl relative to the starting URL to discover pages for the sitemap. 0 means only the starting URL. Max allowed depth is 5. Defaults to 2."),
        limit: z.number().int().min(1).max(5000).optional().default(1000).describe("Maximum number of URLs to include in the generated sitemap XML. Defaults to 1000. Max allowed is 5000."),
    };
  • Registers the 'generate-site-map' tool with the MCP server using server.tool, providing name, description, params schema, and handler.
    server.tool(
        TOOL_NAME,
        TOOL_DESCRIPTION,
        TOOL_PARAMS,
        processRequest
    );
  • The core helper method in GenerateSitemapService that performs the actual sitemap generation: validates inputs, crawls the site, limits URLs, generates XML, and returns SitemapResult.
        public async generateSitemap(startUrl: string, maxDepth: number, limit: number): Promise<SitemapResult> {
            // Basic validation
            if (!startUrl || typeof startUrl !== 'string') {
                throw new ValidationError('Invalid input: startUrl string is required.');
            }
            if (typeof maxDepth !== 'number' || maxDepth < 0) {
                throw new ValidationError('Invalid input: maxDepth must be a non-negative number.');
            }
            if (typeof limit !== 'number' || limit <= 0) {
                throw new ValidationError('Invalid input: limit must be a positive number.');
            }
    
            logger.info(`Starting sitemap generation for: ${startUrl}`, { maxDepth, limit });
    
            try {
                const visited = new Set<string>();
                // Crawl the site to get URLs
                const allUrls = await crawlPage(startUrl, 0, maxDepth, visited);
                const uniqueUrls = Array.from(new Set(allUrls)); // Ensure uniqueness again
                logger.debug(`Crawl discovered ${uniqueUrls.length} unique URLs.`);
    
                // Apply the limit
                const limitedUrls = uniqueUrls.slice(0, limit);
                logger.debug(`Limiting sitemap to ${limitedUrls.length} URLs.`);
    
                // Generate XML sitemap string
                // Ensure URLs are properly escaped for XML
                const urlEntries = limitedUrls
                    .map(url => `  <url>
        <loc>${escape(url)}</loc>
        <lastmod>${new Date().toISOString().split('T')[0]}</lastmod> 
      </url>`) // Use YYYY-MM-DD format for lastmod
                    .join('\n');
    
                const sitemapXml = `<?xml version="1.0" encoding="UTF-8"?>
    <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
    ${urlEntries}
    </urlset>`;
    
                const result: SitemapResult = {
                    sitemapXml: sitemapXml,
                    urlCount: limitedUrls.length,
                };
    
                logger.info(`Finished sitemap generation for ${startUrl}. Included ${result.urlCount} URLs.`);
                return result;
    
            } catch (error) {
                logger.error(`Error during sitemap generation for ${startUrl}`, { error: error instanceof Error ? error.message : String(error), startUrl, maxDepth, limit });
                // Wrap errors from crawlPage or XML generation
                throw new ServiceError(`Sitemap generation failed for ${startUrl}: ${error instanceof Error ? error.message : String(error)}`, error);
            }
        }
  • Invocation of the generateSitemapTool registration function from the central registerTools function.
    generateSitemapTool(server);
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It describes the crawling process and output generation, but lacks details on performance (e.g., rate limits, timeouts), error handling, or authentication needs. It mentions constraints ('up to a specified limit', 'up to a specified depth'), which adds some context, but overall behavioral traits are minimally covered.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, well-structured sentence that efficiently conveys the core functionality without redundancy. It front-loads key actions ('crawls', 'generates') and includes essential constraints, making every word contribute meaningfully to understanding the tool's purpose.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's moderate complexity (crawling with depth/limit constraints) and no annotations or output schema, the description is adequate but incomplete. It covers the basic operation and output type, but lacks details on return values (e.g., XML structure, error responses) and behavioral aspects like performance or prerequisites, which are needed for full contextual understanding.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema fully documents all parameters (url, maxDepth, limit) with details like defaults, ranges, and formats. The description adds no additional parameter semantics beyond what the schema provides, such as explaining interactions between parameters or edge cases, meeting the baseline for high schema coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the specific action ('crawls a website', 'generates an XML sitemap') and the resource ('discovered URLs'), distinguishing it from siblings like 'check-links' (validation) or 'fetch-page' (single page retrieval). It explicitly mentions the scope ('starting from a given URL up to a specified depth') and output format ('XML sitemap'), making the purpose unambiguous.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives like 'crawl-site' or 'extract-links'. It does not mention prerequisites, exclusions, or comparative contexts, leaving the agent to infer usage solely from the tool name and description without explicit direction.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Related Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/bsmi021/mcp-server-webscan'

If you have feedback or need assistance with the MCP directory API, please join our Discord server