Skip to main content
Glama
by bsmi021

find-patterns

Extract and filter anchor links from a web page by matching their absolute URLs against a specified regex pattern. Returns both the URL and anchor text for each matching link.

Instructions

Fetches a web page, extracts all anchor ('a') links, resolves their absolute URLs, and returns a list of links whose URLs match a given JavaScript-compatible regular expression pattern. Includes the URL and anchor text for each match.

Input Schema

NameRequiredDescriptionDefault
patternYesA JavaScript-compatible regular expression pattern (without enclosing slashes or flags) used to test against the absolute URLs of the links found on the page. Example: 'product\/\d+' to find product links.
urlYesThe fully qualified URL of the web page to search for link patterns. Must be a valid HTTP or HTTPS URL.

Input Schema (JSON Schema)

{ "$schema": "http://json-schema.org/draft-07/schema#", "additionalProperties": false, "properties": { "pattern": { "description": "A JavaScript-compatible regular expression pattern (without enclosing slashes or flags) used to test against the absolute URLs of the links found on the page. Example: 'product\\/\\d+' to find product links.", "minLength": 1, "type": "string" }, "url": { "description": "The fully qualified URL of the web page to search for link patterns. Must be a valid HTTP or HTTPS URL.", "format": "uri", "type": "string" } }, "required": [ "url", "pattern" ], "type": "object" }

Implementation Reference

  • The async handler function that processes 'find-patterns' tool calls: extracts args, invokes FindPatternsService, formats JSON response, handles errors with MCPError mapping.
    const processRequest = async (args: FindPatternsToolArgs) => { const { url, pattern } = args; logger.debug(`Received ${TOOL_NAME} request`, { url, pattern: '...' }); // Avoid logging potentially large/complex patterns directly unless needed try { // Call the service method const results = await serviceInstance.findLinksByPattern(url, pattern); // Format the successful output for MCP return { content: [{ type: "text" as const, text: JSON.stringify(results, null, 2) }] }; } catch (error) { const logContext = { args: { url: args.url, pattern: '...' }, // Mask pattern in logs errorDetails: error instanceof Error ? { name: error.name, message: error.message, stack: error.stack } : String(error) }; logger.error(`Error processing ${TOOL_NAME}`, logContext); // Map service-specific errors to McpError if (error instanceof ValidationError) { // Check if it's the specific regex validation error from the service if (error.message.startsWith('Invalid regex pattern')) { throw new McpError(ErrorCode.InvalidParams, error.message, error.details); // Pass specific message } throw new McpError(ErrorCode.InvalidParams, `Validation failed: ${error.message}`, error.details); } if (error instanceof ServiceError) { throw new McpError(ErrorCode.InternalError, error.message, error.details); } if (error instanceof McpError) { throw error; // Re-throw existing McpErrors } // Catch-all for unexpected errors throw new McpError( ErrorCode.InternalError, error instanceof Error ? `An unexpected error occurred in ${TOOL_NAME}: ${error.message}` : `An unexpected error occurred in ${TOOL_NAME}.` ); } };
  • Zod schema definition for TOOL_PARAMS, TOOL_NAME, and TOOL_DESCRIPTION used in tool registration.
    export const TOOL_NAME = "find-patterns"; export const TOOL_DESCRIPTION = `Fetches a web page, extracts all anchor ('a') links, resolves their absolute URLs, and returns a list of links whose URLs match a given JavaScript-compatible regular expression pattern. Includes the URL and anchor text for each match.`; export const TOOL_PARAMS = { url: z.string().url().describe("The fully qualified URL of the web page to search for link patterns. Must be a valid HTTP or HTTPS URL."), pattern: z.string().min(1).describe("A JavaScript-compatible regular expression pattern (without enclosing slashes or flags) used to test against the absolute URLs of the links found on the page. Example: 'product\\/\\d+' to find product links."), };
  • MCP server.tool() registration call for the 'find-patterns' tool using name, desc, params schema, and processRequest handler.
    server.tool( TOOL_NAME, TOOL_DESCRIPTION, TOOL_PARAMS, processRequest );
  • Top-level tool registration in registerTools(): imports and invokes findPatternsTool(server). Note: startLine adjusted to include import.
    import { findPatternsTool } from "./findPatterns.js"; import { generateSitemapTool } from "./generateSitemapTool.js"; /** * Registers all available tools with the MCP server instance. * This function acts as the central point for tool registration. * * @param server - The McpServer instance to register tools with. */ export function registerTools(server: McpServer): void { logger.info("Registering tools..."); // Get config manager instance if needed to pass specific configs to tools // const configManager = ConfigurationManager.getInstance(); // Register each tool // Pass specific config if the tool/service requires it, e.g.: // checkLinksTool(server, configManager.getCheckLinksConfig()); checkLinksTool(server); crawlSiteTool(server); extractLinksTool(server); fetchPageTool(server); findPatternsTool(server);
  • Core helper method in FindPatternsService that fetches HTML, parses links with cheerio, resolves absolute URLs, tests regex pattern on URLs, collects matching LinkResult objects.
    public async findLinksByPattern(pageUrl: string, pattern: string): Promise<LinkResult[]> { // Basic validation if (!pageUrl || typeof pageUrl !== 'string') { throw new ValidationError('Invalid input: pageUrl string is required.'); } if (!pattern || typeof pattern !== 'string') { throw new ValidationError('Invalid input: pattern string is required.'); } let regex: RegExp; try { regex = new RegExp(pattern); // Compile the regex pattern } catch (e) { logger.error(`Invalid regex pattern provided: ${pattern}`, { error: e instanceof Error ? e.message : String(e) }); throw new ValidationError(`Invalid regex pattern: ${pattern}. Error: ${e instanceof Error ? e.message : String(e)}`); } logger.info(`Starting pattern search on: ${pageUrl}`, { pattern }); const matches: LinkResult[] = []; try { const { $ } = await fetchHtml(pageUrl); logger.debug(`Successfully fetched HTML for ${pageUrl}`); const linkElements = $('a[href]').toArray(); logger.debug(`Found ${linkElements.length} anchor elements on ${pageUrl}`); for (const element of linkElements) { const link = $(element); const href = link.attr('href'); const text = link.text().trim() || '[No text]'; // Basic filtering if (!href || href.startsWith('#') || href.startsWith('mailto:') || href.startsWith('tel:')) { continue; } let absoluteUrl: string; try { absoluteUrl = new URL(href, pageUrl).toString(); } catch (e) { logger.warn(`Could not resolve href '${href}' on page ${pageUrl}`, { error: e instanceof Error ? e.message : String(e) }); continue; // Skip invalid URLs } // Test the absolute URL against the regex if (regex.test(absoluteUrl)) { logger.debug(`Pattern match found: ${absoluteUrl}`); matches.push({ url: absoluteUrl, text: text }); } } } catch (fetchError) { logger.error(`Failed to fetch or process page ${pageUrl} for pattern finding`, { error: fetchError instanceof Error ? fetchError.message : String(fetchError) }); throw new ServiceError(`Failed to fetch or process page ${pageUrl}: ${fetchError instanceof Error ? fetchError.message : String(fetchError)}`, fetchError); } logger.info(`Finished pattern search for ${pageUrl}. Found ${matches.length} matching links.`); return matches; }

Other Tools

Related Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/bsmi021/mcp-server-webscan'

If you have feedback or need assistance with the MCP directory API, please join our Discord server