Skip to main content
Glama
bsmi021

MCP Webscan Server

by bsmi021

check-links

Validates all links on a specified webpage by extracting anchor tags and checking their reachability via HEAD requests, identifying broken, valid, or invalid URLs for efficient link management.

Instructions

Fetches a given URL, extracts all anchor ('a') links, and checks each linked URL for validity (reachability via HEAD request). Returns a list of checked links with their status ('valid', 'broken', or 'invalid_url' if the href attribute couldn't be resolved to a full URL).

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
urlYesThe fully qualified URL of the web page to check for broken links. Must be a valid HTTP or HTTPS URL.

Implementation Reference

  • The handler function for the 'check-links' tool. It processes the input arguments, calls the CheckLinksService.checkLinksOnPage method, formats the results as MCP content, and handles errors appropriately.
    const processRequest = async (args: CheckLinksToolArgs) => {
        logger.debug(`Received ${TOOL_NAME} request`, { args });
    
        // Input validation is implicitly handled by Zod in TOOL_PARAMS when the server calls this.
        // If more complex validation or transformation is needed, it can be done here.
    
        try {
            // Call the service method with the validated URL
            const results = await serviceInstance.checkLinksOnPage(args.url);
    
            // Format the successful output for MCP
            return {
                content: [{
                    type: "text" as const,
                    text: JSON.stringify(results, null, 2) // Pretty print JSON
                }]
            };
    
        } catch (error) {
            // Combine error and args into a single context object for logging
            const logContext = {
                args,
                errorDetails: error instanceof Error ? { name: error.name, message: error.message, stack: error.stack } : String(error)
            };
            logger.error(`Error processing ${TOOL_NAME}`, logContext); // Pass combined context
    
            // Map service-specific errors to McpError
            if (error instanceof ValidationError) {
                throw new McpError(
                    ErrorCode.InvalidParams,
                    `Validation failed: ${error.message}`,
                    error.details // Pass details if available
                );
            }
            if (error instanceof ServiceError) {
                // Use the message from the ServiceError
                throw new McpError(
                    ErrorCode.InternalError,
                    error.message, // Pass service error message directly
                    error.details
                );
            }
            if (error instanceof McpError) {
                throw error; // Re-throw existing McpErrors
            }
    
            // Catch-all for unexpected errors
            throw new McpError(
                ErrorCode.InternalError,
                error instanceof Error ? `An unexpected error occurred in ${TOOL_NAME}: ${error.message}` : `An unexpected error occurred in ${TOOL_NAME}.`
            );
        }
    };
  • Registers the 'check-links' tool with the MCP server using server.tool(), providing name, description, Zod schema, and handler.
    // Register the tool with the server
    server.tool(
        TOOL_NAME,
        TOOL_DESCRIPTION,
        TOOL_PARAMS, // Pass the Zod schema directly
        processRequest
    );
  • Defines the tool name, description, and input schema (Zod object with 'url' parameter) for the 'check-links' tool.
    export const TOOL_NAME = "check-links";
    
    export const TOOL_DESCRIPTION = `Fetches a given URL, extracts all anchor ('a') links, and checks each linked URL for validity (reachability via HEAD request). Returns a list of checked links with their status ('valid', 'broken', or 'invalid_url' if the href attribute couldn't be resolved to a full URL).`;
    
    export const TOOL_PARAMS = {
        url: z.string().url().describe("The fully qualified URL of the web page to check for broken links. Must be a valid HTTP or HTTPS URL."),
    };
  • The core helper method in CheckLinksService that implements the link checking logic: fetches HTML, extracts 'a' links, resolves absolute URLs, checks reachability concurrently, and returns results.
    public async checkLinksOnPage(pageUrl: string): Promise<LinkCheckResult[]> {
        if (!pageUrl || typeof pageUrl !== 'string') {
            throw new ValidationError('Invalid input: pageUrl string is required.');
        }
        logger.info(`Starting link check for page: ${pageUrl}`);
    
        const results: LinkCheckResult[] = [];
        const checkedUrls = new Set<string>(); // Keep track of URLs already checked in this run
    
        try {
            // Fetch the HTML content and Cheerio object
            const { $ } = await fetchHtml(pageUrl);
            logger.debug(`Successfully fetched HTML for ${pageUrl}`);
    
            const linkElements = $('a[href]').toArray();
            logger.debug(`Found ${linkElements.length} anchor elements on ${pageUrl}`);
    
            // Process links concurrently for better performance
            const checkPromises = linkElements.map(async (element) => {
                const href = $(element).attr('href');
    
                // Basic filtering for href attribute
                if (!href || href.startsWith('#') || href.startsWith('mailto:') || href.startsWith('tel:')) {
                    logger.debug(`Skipping invalid or local href: ${href}`);
                    return null; // Skip this link
                }
    
                let absoluteUrl: string;
                try {
                    // Resolve the relative URL against the page URL
                    absoluteUrl = new URL(href, pageUrl).toString();
                } catch (e) {
                    logger.warn(`Could not resolve href '${href}' on page ${pageUrl}`, { error: e instanceof Error ? e.message : String(e) });
                    // Add result for invalid URL format
                    return { url: href, status: 'invalid_url' } as LinkCheckResult;
                }
    
                // Check if this absolute URL has already been processed in this run
                if (checkedUrls.has(absoluteUrl)) {
                    logger.debug(`Skipping already checked URL: ${absoluteUrl}`);
                    return null; // Skip duplicate check
                }
                checkedUrls.add(absoluteUrl); // Mark as checked for this run
    
                // Check the validity (reachability) of the absolute URL
                try {
                    const isValid = await isValidUrl(absoluteUrl);
                    logger.debug(`Checked URL: ${absoluteUrl} - Status: ${isValid ? 'valid' : 'broken'}`);
                    return { url: absoluteUrl, status: isValid ? 'valid' : 'broken' } as LinkCheckResult;
                } catch (checkError) {
                    // Log error during isValidUrl check, but still report as broken
                    logger.error(`Error checking validity of URL ${absoluteUrl}`, { error: checkError instanceof Error ? checkError.message : String(checkError) });
                    return { url: absoluteUrl, status: 'broken' } as LinkCheckResult;
                }
            });
    
            // Wait for all checks to complete and filter out nulls (skipped links)
            const completedResults = (await Promise.all(checkPromises)).filter(result => result !== null) as LinkCheckResult[];
            results.push(...completedResults);
    
        } catch (fetchError) {
            // Handle errors during the initial fetchHtml call
            logger.error(`Failed to fetch or process page ${pageUrl}`, { error: fetchError instanceof Error ? fetchError.message : String(fetchError) });
            // Wrap fetch error in a ServiceError
            throw new ServiceError(`Failed to fetch or process the page at ${pageUrl}: ${fetchError instanceof Error ? fetchError.message : String(fetchError)}`, fetchError);
        }
    
        logger.info(`Finished link check for page: ${pageUrl}. Found ${results.length} results.`);
        return results;
    }
  • TypeScript interface defining the expected input arguments for the 'check-links' tool.
    export interface CheckLinksArgs {
        url: string;
    }
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries full burden and does well by disclosing key behaviors: it fetches a URL, extracts anchor links, performs HEAD requests for validation, and categorizes results into three status types. It doesn't mention rate limits, timeouts, or authentication needs, but covers the core operational behavior adequately.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is perfectly concise and front-loaded: a single sentence that efficiently explains the entire workflow from input to output. Every word earns its place with no redundancy or wasted text.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a single-parameter tool with no annotations and no output schema, the description provides excellent context about what the tool does and what it returns. It could be more complete by mentioning potential limitations (e.g., JavaScript-rendered links, redirect handling) or output format details, but covers the essentials well.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already fully documents the single 'url' parameter. The description adds no additional parameter semantics beyond what's in the schema (e.g., it doesn't specify URL format requirements or examples). Baseline 3 is appropriate when schema does the heavy lifting.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose with specific verbs ('fetches', 'extracts', 'checks') and resources ('URL', 'anchor links'), distinguishing it from siblings like 'extract-links' (which likely only extracts) or 'fetch-page' (which likely only fetches). It explicitly mentions what makes this tool unique: checking link validity with status categorization.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage context by specifying it checks 'broken links' and handles 'invalid_url' cases, but doesn't explicitly state when to use this tool versus alternatives like 'crawl-site' or 'extract-links'. It provides clear functional context but lacks explicit comparative guidance.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Related Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/bsmi021/mcp-server-webscan'

If you have feedback or need assistance with the MCP directory API, please join our Discord server