Skip to main content
Glama
bsmi021

MCP Webscan Server

by bsmi021

crawl-site

Scan and extract all unique URLs from a website by recursively crawling from a given URL up to a specified depth. Designed for web content analysis.

Instructions

Recursively crawls a website starting from a given URL up to a specified maximum depth. It follows links within the same origin and returns a list of all unique URLs found during the crawl.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
maxDepthNoThe maximum depth to crawl relative to the starting URL. 0 means only the starting URL is fetched. Max allowed depth is 5 to prevent excessive crawling. Defaults to 2.
urlYesThe starting URL for the crawl. Must be a valid HTTP or HTTPS URL.

Implementation Reference

  • The handler function that executes the core logic for the 'crawl-site' tool. It destructures the input arguments, invokes the CrawlSiteService to perform the crawl, formats the JSON result for MCP response, and maps errors to appropriate McpError types.
    const processRequest = async (args: CrawlSiteToolArgs) => {
        // Zod handles default for maxDepth if not provided
        const { url, maxDepth } = args;
        logger.debug(`Received ${TOOL_NAME} request`, { url, maxDepth });
    
        try {
            // Call the service method
            const result = await serviceInstance.crawlWebsite(url, maxDepth);
    
            // Format the successful output for MCP
            return {
                content: [{
                    type: "text" as const,
                    text: JSON.stringify(result, null, 2)
                }]
            };
    
        } catch (error) {
            const logContext = {
                args,
                errorDetails: error instanceof Error ? { name: error.name, message: error.message, stack: error.stack } : String(error)
            };
            logger.error(`Error processing ${TOOL_NAME}`, logContext);
    
            // Map service-specific errors to McpError
            if (error instanceof ValidationError) {
                throw new McpError(ErrorCode.InvalidParams, `Validation failed: ${error.message}`, error.details);
            }
            if (error instanceof ServiceError) {
                throw new McpError(ErrorCode.InternalError, error.message, error.details);
            }
            if (error instanceof McpError) {
                throw error; // Re-throw existing McpErrors
            }
    
            // Catch-all for unexpected errors
            throw new McpError(
                ErrorCode.InternalError,
                error instanceof Error ? `An unexpected error occurred in ${TOOL_NAME}: ${error.message}` : `An unexpected error occurred in ${TOOL_NAME}.`
            );
        }
    };
  • Zod schema definition for the 'crawl-site' tool inputs (TOOL_PARAMS), along with the tool name and description used during registration.
    export const TOOL_NAME = "crawl-site";
    
    export const TOOL_DESCRIPTION = `Recursively crawls a website starting from a given URL up to a specified maximum depth. It follows links within the same origin and returns a list of all unique URLs found during the crawl.`;
    
    export const TOOL_PARAMS = {
        url: z.string().url().describe("The starting URL for the crawl. Must be a valid HTTP or HTTPS URL."),
        maxDepth: z.number().int().min(0).max(5).optional().default(2).describe("The maximum depth to crawl relative to the starting URL. 0 means only the starting URL is fetched. Max allowed depth is 5 to prevent excessive crawling. Defaults to 2."),
    };
  • Registers the 'crawl-site' tool with the MCP server by calling server.tool() with the name, description, input schema, and handler function.
    server.tool(
        TOOL_NAME,
        TOOL_DESCRIPTION,
        TOOL_PARAMS,
        processRequest
    );
  • Supporting service method implementing the recursive website crawling logic using the crawlPage utility, including input validation, visited URL tracking, result formatting, and error handling.
    public async crawlWebsite(startUrl: string, maxDepth: number): Promise<CrawlResult> {
        // Basic validation
        if (!startUrl || typeof startUrl !== 'string') {
            throw new ValidationError('Invalid input: startUrl string is required.');
        }
        if (typeof maxDepth !== 'number' || maxDepth < 0) {
            throw new ValidationError('Invalid input: maxDepth must be a non-negative number.');
        }
    
        logger.info(`Starting crawl for: ${startUrl} up to depth ${maxDepth}`);
    
        try {
            const visited = new Set<string>();
            // Call the utility function
            const urls = await crawlPage(startUrl, 0, maxDepth, visited);
    
            // Ensure uniqueness (though crawlPage should handle it, belt-and-suspenders)
            const uniqueUrls = Array.from(new Set(urls));
    
            const result: CrawlResult = {
                crawled_urls: uniqueUrls,
                total_urls: uniqueUrls.length,
            };
    
            logger.info(`Finished crawl for ${startUrl}. Found ${result.total_urls} unique URLs.`);
            return result;
    
        } catch (error) {
            // Catch errors specifically from crawlPage or its dependencies (like fetchHtml)
            logger.error(`Error during crawlWebsite execution for ${startUrl}`, { error: error instanceof Error ? error.message : String(error), startUrl, maxDepth });
    
            // Wrap unexpected errors in a ServiceError
            throw new ServiceError(`Crawling failed for ${startUrl}: ${error instanceof Error ? error.message : String(error)}`, error);
        }
    }
  • Invocation of the crawlSiteTool registration function within the central registerTools function.
    crawlSiteTool(server);
Install Server

Other Tools

Related Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/bsmi021/mcp-server-webscan'

If you have feedback or need assistance with the MCP directory API, please join our Discord server