Skip to main content
Glama
bsmi021

MCP Webscan Server

by bsmi021

find-patterns

Extract and filter anchor links from a web page by matching their absolute URLs against a specified regex pattern. Returns both the URL and anchor text for each matching link.

Instructions

Fetches a web page, extracts all anchor ('a') links, resolves their absolute URLs, and returns a list of links whose URLs match a given JavaScript-compatible regular expression pattern. Includes the URL and anchor text for each match.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
patternYesA JavaScript-compatible regular expression pattern (without enclosing slashes or flags) used to test against the absolute URLs of the links found on the page. Example: 'product\/\d+' to find product links.
urlYesThe fully qualified URL of the web page to search for link patterns. Must be a valid HTTP or HTTPS URL.

Implementation Reference

  • The async handler function that processes 'find-patterns' tool calls: extracts args, invokes FindPatternsService, formats JSON response, handles errors with MCPError mapping.
    const processRequest = async (args: FindPatternsToolArgs) => {
        const { url, pattern } = args;
        logger.debug(`Received ${TOOL_NAME} request`, { url, pattern: '...' }); // Avoid logging potentially large/complex patterns directly unless needed
    
        try {
            // Call the service method
            const results = await serviceInstance.findLinksByPattern(url, pattern);
    
            // Format the successful output for MCP
            return {
                content: [{
                    type: "text" as const,
                    text: JSON.stringify(results, null, 2)
                }]
            };
    
        } catch (error) {
            const logContext = {
                args: { url: args.url, pattern: '...' }, // Mask pattern in logs
                errorDetails: error instanceof Error ? { name: error.name, message: error.message, stack: error.stack } : String(error)
            };
            logger.error(`Error processing ${TOOL_NAME}`, logContext);
    
            // Map service-specific errors to McpError
            if (error instanceof ValidationError) {
                // Check if it's the specific regex validation error from the service
                if (error.message.startsWith('Invalid regex pattern')) {
                    throw new McpError(ErrorCode.InvalidParams, error.message, error.details); // Pass specific message
                }
                throw new McpError(ErrorCode.InvalidParams, `Validation failed: ${error.message}`, error.details);
            }
            if (error instanceof ServiceError) {
                throw new McpError(ErrorCode.InternalError, error.message, error.details);
            }
            if (error instanceof McpError) {
                throw error; // Re-throw existing McpErrors
            }
    
            // Catch-all for unexpected errors
            throw new McpError(
                ErrorCode.InternalError,
                error instanceof Error ? `An unexpected error occurred in ${TOOL_NAME}: ${error.message}` : `An unexpected error occurred in ${TOOL_NAME}.`
            );
        }
    };
  • Zod schema definition for TOOL_PARAMS, TOOL_NAME, and TOOL_DESCRIPTION used in tool registration.
    export const TOOL_NAME = "find-patterns";
    
    export const TOOL_DESCRIPTION = `Fetches a web page, extracts all anchor ('a') links, resolves their absolute URLs, and returns a list of links whose URLs match a given JavaScript-compatible regular expression pattern. Includes the URL and anchor text for each match.`;
    
    export const TOOL_PARAMS = {
        url: z.string().url().describe("The fully qualified URL of the web page to search for link patterns. Must be a valid HTTP or HTTPS URL."),
        pattern: z.string().min(1).describe("A JavaScript-compatible regular expression pattern (without enclosing slashes or flags) used to test against the absolute URLs of the links found on the page. Example: 'product\\/\\d+' to find product links."),
    };
  • MCP server.tool() registration call for the 'find-patterns' tool using name, desc, params schema, and processRequest handler.
    server.tool(
        TOOL_NAME,
        TOOL_DESCRIPTION,
        TOOL_PARAMS,
        processRequest
    );
  • Top-level tool registration in registerTools(): imports and invokes findPatternsTool(server). Note: startLine adjusted to include import.
    import { findPatternsTool } from "./findPatterns.js";
    import { generateSitemapTool } from "./generateSitemapTool.js";
    
    /**
     * Registers all available tools with the MCP server instance.
     * This function acts as the central point for tool registration.
     *
     * @param server - The McpServer instance to register tools with.
     */
    export function registerTools(server: McpServer): void {
        logger.info("Registering tools...");
    
        // Get config manager instance if needed to pass specific configs to tools
        // const configManager = ConfigurationManager.getInstance();
    
        // Register each tool
        // Pass specific config if the tool/service requires it, e.g.:
        // checkLinksTool(server, configManager.getCheckLinksConfig());
        checkLinksTool(server);
        crawlSiteTool(server);
        extractLinksTool(server);
        fetchPageTool(server);
        findPatternsTool(server);
  • Core helper method in FindPatternsService that fetches HTML, parses links with cheerio, resolves absolute URLs, tests regex pattern on URLs, collects matching LinkResult objects.
    public async findLinksByPattern(pageUrl: string, pattern: string): Promise<LinkResult[]> {
        // Basic validation
        if (!pageUrl || typeof pageUrl !== 'string') {
            throw new ValidationError('Invalid input: pageUrl string is required.');
        }
        if (!pattern || typeof pattern !== 'string') {
            throw new ValidationError('Invalid input: pattern string is required.');
        }
    
        let regex: RegExp;
        try {
            regex = new RegExp(pattern); // Compile the regex pattern
        } catch (e) {
            logger.error(`Invalid regex pattern provided: ${pattern}`, { error: e instanceof Error ? e.message : String(e) });
            throw new ValidationError(`Invalid regex pattern: ${pattern}. Error: ${e instanceof Error ? e.message : String(e)}`);
        }
    
        logger.info(`Starting pattern search on: ${pageUrl}`, { pattern });
    
        const matches: LinkResult[] = [];
    
        try {
            const { $ } = await fetchHtml(pageUrl);
            logger.debug(`Successfully fetched HTML for ${pageUrl}`);
    
            const linkElements = $('a[href]').toArray();
            logger.debug(`Found ${linkElements.length} anchor elements on ${pageUrl}`);
    
            for (const element of linkElements) {
                const link = $(element);
                const href = link.attr('href');
                const text = link.text().trim() || '[No text]';
    
                // Basic filtering
                if (!href || href.startsWith('#') || href.startsWith('mailto:') || href.startsWith('tel:')) {
                    continue;
                }
    
                let absoluteUrl: string;
                try {
                    absoluteUrl = new URL(href, pageUrl).toString();
                } catch (e) {
                    logger.warn(`Could not resolve href '${href}' on page ${pageUrl}`, { error: e instanceof Error ? e.message : String(e) });
                    continue; // Skip invalid URLs
                }
    
                // Test the absolute URL against the regex
                if (regex.test(absoluteUrl)) {
                    logger.debug(`Pattern match found: ${absoluteUrl}`);
                    matches.push({ url: absoluteUrl, text: text });
                }
            }
    
        } catch (fetchError) {
            logger.error(`Failed to fetch or process page ${pageUrl} for pattern finding`, { error: fetchError instanceof Error ? fetchError.message : String(fetchError) });
            throw new ServiceError(`Failed to fetch or process page ${pageUrl}: ${fetchError instanceof Error ? fetchError.message : String(fetchError)}`, fetchError);
        }
    
        logger.info(`Finished pattern search for ${pageUrl}. Found ${matches.length} matching links.`);
        return matches;
    }
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It describes the tool's behavior well (fetching, extracting, resolving, filtering links) but lacks details on error handling, performance characteristics (e.g., timeouts, rate limits), or authentication needs. It doesn't contradict any annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is front-loaded with the core functionality in a single, efficient sentence. Every phrase adds value: it specifies the action, the resource (anchor links), the processing (resolving URLs), the filtering mechanism (regex pattern), and the output format (list with URL and anchor text). No wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's moderate complexity (web scraping with regex filtering), no annotations, and no output schema, the description is mostly complete. It explains what the tool does and the output format but could benefit from mentioning potential limitations (e.g., handling of JavaScript-rendered content or pagination). It adequately covers the core functionality without being exhaustive.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The schema description coverage is 100%, providing detailed documentation for both parameters. The description adds minimal value beyond the schema by mentioning the regex is used to 'test against the absolute URLs' and that it returns 'URL and anchor text for each match,' but doesn't explain parameter interactions or edge cases. Baseline 3 is appropriate given the comprehensive schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the specific action ('fetches a web page, extracts all anchor links, resolves their absolute URLs, and returns a list of links') and distinguishes it from siblings by specifying the filtering mechanism ('match a given JavaScript-compatible regular expression pattern'). It goes beyond a simple tautology by detailing the multi-step process.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context for when to use this tool (finding links matching a regex pattern on a web page) but doesn't explicitly mention when not to use it or name alternatives among the sibling tools. It implies usage for pattern-based link extraction without direct comparison to tools like 'extract-links' or 'fetch-page'.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Related Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/bsmi021/mcp-server-webscan'

If you have feedback or need assistance with the MCP directory API, please join our Discord server