Skip to main content
Glama
bsmi021

MCP Webscan Server

by bsmi021

find-patterns

Extract and filter anchor links from a web page by matching their absolute URLs against a specified regex pattern. Returns both the URL and anchor text for each matching link.

Instructions

Fetches a web page, extracts all anchor ('a') links, resolves their absolute URLs, and returns a list of links whose URLs match a given JavaScript-compatible regular expression pattern. Includes the URL and anchor text for each match.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
patternYesA JavaScript-compatible regular expression pattern (without enclosing slashes or flags) used to test against the absolute URLs of the links found on the page. Example: 'product\/\d+' to find product links.
urlYesThe fully qualified URL of the web page to search for link patterns. Must be a valid HTTP or HTTPS URL.

Implementation Reference

  • The async handler function that processes 'find-patterns' tool calls: extracts args, invokes FindPatternsService, formats JSON response, handles errors with MCPError mapping.
    const processRequest = async (args: FindPatternsToolArgs) => {
        const { url, pattern } = args;
        logger.debug(`Received ${TOOL_NAME} request`, { url, pattern: '...' }); // Avoid logging potentially large/complex patterns directly unless needed
    
        try {
            // Call the service method
            const results = await serviceInstance.findLinksByPattern(url, pattern);
    
            // Format the successful output for MCP
            return {
                content: [{
                    type: "text" as const,
                    text: JSON.stringify(results, null, 2)
                }]
            };
    
        } catch (error) {
            const logContext = {
                args: { url: args.url, pattern: '...' }, // Mask pattern in logs
                errorDetails: error instanceof Error ? { name: error.name, message: error.message, stack: error.stack } : String(error)
            };
            logger.error(`Error processing ${TOOL_NAME}`, logContext);
    
            // Map service-specific errors to McpError
            if (error instanceof ValidationError) {
                // Check if it's the specific regex validation error from the service
                if (error.message.startsWith('Invalid regex pattern')) {
                    throw new McpError(ErrorCode.InvalidParams, error.message, error.details); // Pass specific message
                }
                throw new McpError(ErrorCode.InvalidParams, `Validation failed: ${error.message}`, error.details);
            }
            if (error instanceof ServiceError) {
                throw new McpError(ErrorCode.InternalError, error.message, error.details);
            }
            if (error instanceof McpError) {
                throw error; // Re-throw existing McpErrors
            }
    
            // Catch-all for unexpected errors
            throw new McpError(
                ErrorCode.InternalError,
                error instanceof Error ? `An unexpected error occurred in ${TOOL_NAME}: ${error.message}` : `An unexpected error occurred in ${TOOL_NAME}.`
            );
        }
    };
  • Zod schema definition for TOOL_PARAMS, TOOL_NAME, and TOOL_DESCRIPTION used in tool registration.
    export const TOOL_NAME = "find-patterns";
    
    export const TOOL_DESCRIPTION = `Fetches a web page, extracts all anchor ('a') links, resolves their absolute URLs, and returns a list of links whose URLs match a given JavaScript-compatible regular expression pattern. Includes the URL and anchor text for each match.`;
    
    export const TOOL_PARAMS = {
        url: z.string().url().describe("The fully qualified URL of the web page to search for link patterns. Must be a valid HTTP or HTTPS URL."),
        pattern: z.string().min(1).describe("A JavaScript-compatible regular expression pattern (without enclosing slashes or flags) used to test against the absolute URLs of the links found on the page. Example: 'product\\/\\d+' to find product links."),
    };
  • MCP server.tool() registration call for the 'find-patterns' tool using name, desc, params schema, and processRequest handler.
    server.tool(
        TOOL_NAME,
        TOOL_DESCRIPTION,
        TOOL_PARAMS,
        processRequest
    );
  • Top-level tool registration in registerTools(): imports and invokes findPatternsTool(server). Note: startLine adjusted to include import.
    import { findPatternsTool } from "./findPatterns.js";
    import { generateSitemapTool } from "./generateSitemapTool.js";
    
    /**
     * Registers all available tools with the MCP server instance.
     * This function acts as the central point for tool registration.
     *
     * @param server - The McpServer instance to register tools with.
     */
    export function registerTools(server: McpServer): void {
        logger.info("Registering tools...");
    
        // Get config manager instance if needed to pass specific configs to tools
        // const configManager = ConfigurationManager.getInstance();
    
        // Register each tool
        // Pass specific config if the tool/service requires it, e.g.:
        // checkLinksTool(server, configManager.getCheckLinksConfig());
        checkLinksTool(server);
        crawlSiteTool(server);
        extractLinksTool(server);
        fetchPageTool(server);
        findPatternsTool(server);
  • Core helper method in FindPatternsService that fetches HTML, parses links with cheerio, resolves absolute URLs, tests regex pattern on URLs, collects matching LinkResult objects.
    public async findLinksByPattern(pageUrl: string, pattern: string): Promise<LinkResult[]> {
        // Basic validation
        if (!pageUrl || typeof pageUrl !== 'string') {
            throw new ValidationError('Invalid input: pageUrl string is required.');
        }
        if (!pattern || typeof pattern !== 'string') {
            throw new ValidationError('Invalid input: pattern string is required.');
        }
    
        let regex: RegExp;
        try {
            regex = new RegExp(pattern); // Compile the regex pattern
        } catch (e) {
            logger.error(`Invalid regex pattern provided: ${pattern}`, { error: e instanceof Error ? e.message : String(e) });
            throw new ValidationError(`Invalid regex pattern: ${pattern}. Error: ${e instanceof Error ? e.message : String(e)}`);
        }
    
        logger.info(`Starting pattern search on: ${pageUrl}`, { pattern });
    
        const matches: LinkResult[] = [];
    
        try {
            const { $ } = await fetchHtml(pageUrl);
            logger.debug(`Successfully fetched HTML for ${pageUrl}`);
    
            const linkElements = $('a[href]').toArray();
            logger.debug(`Found ${linkElements.length} anchor elements on ${pageUrl}`);
    
            for (const element of linkElements) {
                const link = $(element);
                const href = link.attr('href');
                const text = link.text().trim() || '[No text]';
    
                // Basic filtering
                if (!href || href.startsWith('#') || href.startsWith('mailto:') || href.startsWith('tel:')) {
                    continue;
                }
    
                let absoluteUrl: string;
                try {
                    absoluteUrl = new URL(href, pageUrl).toString();
                } catch (e) {
                    logger.warn(`Could not resolve href '${href}' on page ${pageUrl}`, { error: e instanceof Error ? e.message : String(e) });
                    continue; // Skip invalid URLs
                }
    
                // Test the absolute URL against the regex
                if (regex.test(absoluteUrl)) {
                    logger.debug(`Pattern match found: ${absoluteUrl}`);
                    matches.push({ url: absoluteUrl, text: text });
                }
            }
    
        } catch (fetchError) {
            logger.error(`Failed to fetch or process page ${pageUrl} for pattern finding`, { error: fetchError instanceof Error ? fetchError.message : String(fetchError) });
            throw new ServiceError(`Failed to fetch or process page ${pageUrl}: ${fetchError instanceof Error ? fetchError.message : String(fetchError)}`, fetchError);
        }
    
        logger.info(`Finished pattern search for ${pageUrl}. Found ${matches.length} matching links.`);
        return matches;
    }
Install Server

Other Tools

Related Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/bsmi021/mcp-server-webscan'

If you have feedback or need assistance with the MCP directory API, please join our Discord server