Skip to main content
Glama
bsmi021

MCP Webscan Server

by bsmi021

check-links

Validates all links on a specified webpage by extracting anchor tags and checking their reachability via HEAD requests, identifying broken, valid, or invalid URLs for efficient link management.

Instructions

Fetches a given URL, extracts all anchor ('a') links, and checks each linked URL for validity (reachability via HEAD request). Returns a list of checked links with their status ('valid', 'broken', or 'invalid_url' if the href attribute couldn't be resolved to a full URL).

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
urlYesThe fully qualified URL of the web page to check for broken links. Must be a valid HTTP or HTTPS URL.

Implementation Reference

  • The handler function for the 'check-links' tool. It processes the input arguments, calls the CheckLinksService.checkLinksOnPage method, formats the results as MCP content, and handles errors appropriately.
    const processRequest = async (args: CheckLinksToolArgs) => {
        logger.debug(`Received ${TOOL_NAME} request`, { args });
    
        // Input validation is implicitly handled by Zod in TOOL_PARAMS when the server calls this.
        // If more complex validation or transformation is needed, it can be done here.
    
        try {
            // Call the service method with the validated URL
            const results = await serviceInstance.checkLinksOnPage(args.url);
    
            // Format the successful output for MCP
            return {
                content: [{
                    type: "text" as const,
                    text: JSON.stringify(results, null, 2) // Pretty print JSON
                }]
            };
    
        } catch (error) {
            // Combine error and args into a single context object for logging
            const logContext = {
                args,
                errorDetails: error instanceof Error ? { name: error.name, message: error.message, stack: error.stack } : String(error)
            };
            logger.error(`Error processing ${TOOL_NAME}`, logContext); // Pass combined context
    
            // Map service-specific errors to McpError
            if (error instanceof ValidationError) {
                throw new McpError(
                    ErrorCode.InvalidParams,
                    `Validation failed: ${error.message}`,
                    error.details // Pass details if available
                );
            }
            if (error instanceof ServiceError) {
                // Use the message from the ServiceError
                throw new McpError(
                    ErrorCode.InternalError,
                    error.message, // Pass service error message directly
                    error.details
                );
            }
            if (error instanceof McpError) {
                throw error; // Re-throw existing McpErrors
            }
    
            // Catch-all for unexpected errors
            throw new McpError(
                ErrorCode.InternalError,
                error instanceof Error ? `An unexpected error occurred in ${TOOL_NAME}: ${error.message}` : `An unexpected error occurred in ${TOOL_NAME}.`
            );
        }
    };
  • Registers the 'check-links' tool with the MCP server using server.tool(), providing name, description, Zod schema, and handler.
    // Register the tool with the server
    server.tool(
        TOOL_NAME,
        TOOL_DESCRIPTION,
        TOOL_PARAMS, // Pass the Zod schema directly
        processRequest
    );
  • Defines the tool name, description, and input schema (Zod object with 'url' parameter) for the 'check-links' tool.
    export const TOOL_NAME = "check-links";
    
    export const TOOL_DESCRIPTION = `Fetches a given URL, extracts all anchor ('a') links, and checks each linked URL for validity (reachability via HEAD request). Returns a list of checked links with their status ('valid', 'broken', or 'invalid_url' if the href attribute couldn't be resolved to a full URL).`;
    
    export const TOOL_PARAMS = {
        url: z.string().url().describe("The fully qualified URL of the web page to check for broken links. Must be a valid HTTP or HTTPS URL."),
    };
  • The core helper method in CheckLinksService that implements the link checking logic: fetches HTML, extracts 'a' links, resolves absolute URLs, checks reachability concurrently, and returns results.
    public async checkLinksOnPage(pageUrl: string): Promise<LinkCheckResult[]> {
        if (!pageUrl || typeof pageUrl !== 'string') {
            throw new ValidationError('Invalid input: pageUrl string is required.');
        }
        logger.info(`Starting link check for page: ${pageUrl}`);
    
        const results: LinkCheckResult[] = [];
        const checkedUrls = new Set<string>(); // Keep track of URLs already checked in this run
    
        try {
            // Fetch the HTML content and Cheerio object
            const { $ } = await fetchHtml(pageUrl);
            logger.debug(`Successfully fetched HTML for ${pageUrl}`);
    
            const linkElements = $('a[href]').toArray();
            logger.debug(`Found ${linkElements.length} anchor elements on ${pageUrl}`);
    
            // Process links concurrently for better performance
            const checkPromises = linkElements.map(async (element) => {
                const href = $(element).attr('href');
    
                // Basic filtering for href attribute
                if (!href || href.startsWith('#') || href.startsWith('mailto:') || href.startsWith('tel:')) {
                    logger.debug(`Skipping invalid or local href: ${href}`);
                    return null; // Skip this link
                }
    
                let absoluteUrl: string;
                try {
                    // Resolve the relative URL against the page URL
                    absoluteUrl = new URL(href, pageUrl).toString();
                } catch (e) {
                    logger.warn(`Could not resolve href '${href}' on page ${pageUrl}`, { error: e instanceof Error ? e.message : String(e) });
                    // Add result for invalid URL format
                    return { url: href, status: 'invalid_url' } as LinkCheckResult;
                }
    
                // Check if this absolute URL has already been processed in this run
                if (checkedUrls.has(absoluteUrl)) {
                    logger.debug(`Skipping already checked URL: ${absoluteUrl}`);
                    return null; // Skip duplicate check
                }
                checkedUrls.add(absoluteUrl); // Mark as checked for this run
    
                // Check the validity (reachability) of the absolute URL
                try {
                    const isValid = await isValidUrl(absoluteUrl);
                    logger.debug(`Checked URL: ${absoluteUrl} - Status: ${isValid ? 'valid' : 'broken'}`);
                    return { url: absoluteUrl, status: isValid ? 'valid' : 'broken' } as LinkCheckResult;
                } catch (checkError) {
                    // Log error during isValidUrl check, but still report as broken
                    logger.error(`Error checking validity of URL ${absoluteUrl}`, { error: checkError instanceof Error ? checkError.message : String(checkError) });
                    return { url: absoluteUrl, status: 'broken' } as LinkCheckResult;
                }
            });
    
            // Wait for all checks to complete and filter out nulls (skipped links)
            const completedResults = (await Promise.all(checkPromises)).filter(result => result !== null) as LinkCheckResult[];
            results.push(...completedResults);
    
        } catch (fetchError) {
            // Handle errors during the initial fetchHtml call
            logger.error(`Failed to fetch or process page ${pageUrl}`, { error: fetchError instanceof Error ? fetchError.message : String(fetchError) });
            // Wrap fetch error in a ServiceError
            throw new ServiceError(`Failed to fetch or process the page at ${pageUrl}: ${fetchError instanceof Error ? fetchError.message : String(fetchError)}`, fetchError);
        }
    
        logger.info(`Finished link check for page: ${pageUrl}. Found ${results.length} results.`);
        return results;
    }
  • TypeScript interface defining the expected input arguments for the 'check-links' tool.
    export interface CheckLinksArgs {
        url: string;
    }
Install Server

Other Tools

Related Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/bsmi021/mcp-server-webscan'

If you have feedback or need assistance with the MCP directory API, please join our Discord server