find-patterns
Extract and filter anchor links from a web page by matching their absolute URLs against a specified regex pattern. Returns both the URL and anchor text for each matching link.
Instructions
Fetches a web page, extracts all anchor ('a') links, resolves their absolute URLs, and returns a list of links whose URLs match a given JavaScript-compatible regular expression pattern. Includes the URL and anchor text for each match.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| pattern | Yes | A JavaScript-compatible regular expression pattern (without enclosing slashes or flags) used to test against the absolute URLs of the links found on the page. Example: 'product\/\d+' to find product links. | |
| url | Yes | The fully qualified URL of the web page to search for link patterns. Must be a valid HTTP or HTTPS URL. |
Input Schema (JSON Schema)
{
"$schema": "http://json-schema.org/draft-07/schema#",
"additionalProperties": false,
"properties": {
"pattern": {
"description": "A JavaScript-compatible regular expression pattern (without enclosing slashes or flags) used to test against the absolute URLs of the links found on the page. Example: 'product\\/\\d+' to find product links.",
"minLength": 1,
"type": "string"
},
"url": {
"description": "The fully qualified URL of the web page to search for link patterns. Must be a valid HTTP or HTTPS URL.",
"format": "uri",
"type": "string"
}
},
"required": [
"url",
"pattern"
],
"type": "object"
}
Implementation Reference
- src/tools/findPatterns.ts:27-71 (handler)The async handler function that processes 'find-patterns' tool calls: extracts args, invokes FindPatternsService, formats JSON response, handles errors with MCPError mapping.const processRequest = async (args: FindPatternsToolArgs) => { const { url, pattern } = args; logger.debug(`Received ${TOOL_NAME} request`, { url, pattern: '...' }); // Avoid logging potentially large/complex patterns directly unless needed try { // Call the service method const results = await serviceInstance.findLinksByPattern(url, pattern); // Format the successful output for MCP return { content: [{ type: "text" as const, text: JSON.stringify(results, null, 2) }] }; } catch (error) { const logContext = { args: { url: args.url, pattern: '...' }, // Mask pattern in logs errorDetails: error instanceof Error ? { name: error.name, message: error.message, stack: error.stack } : String(error) }; logger.error(`Error processing ${TOOL_NAME}`, logContext); // Map service-specific errors to McpError if (error instanceof ValidationError) { // Check if it's the specific regex validation error from the service if (error.message.startsWith('Invalid regex pattern')) { throw new McpError(ErrorCode.InvalidParams, error.message, error.details); // Pass specific message } throw new McpError(ErrorCode.InvalidParams, `Validation failed: ${error.message}`, error.details); } if (error instanceof ServiceError) { throw new McpError(ErrorCode.InternalError, error.message, error.details); } if (error instanceof McpError) { throw error; // Re-throw existing McpErrors } // Catch-all for unexpected errors throw new McpError( ErrorCode.InternalError, error instanceof Error ? `An unexpected error occurred in ${TOOL_NAME}: ${error.message}` : `An unexpected error occurred in ${TOOL_NAME}.` ); } };
- Zod schema definition for TOOL_PARAMS, TOOL_NAME, and TOOL_DESCRIPTION used in tool registration.export const TOOL_NAME = "find-patterns"; export const TOOL_DESCRIPTION = `Fetches a web page, extracts all anchor ('a') links, resolves their absolute URLs, and returns a list of links whose URLs match a given JavaScript-compatible regular expression pattern. Includes the URL and anchor text for each match.`; export const TOOL_PARAMS = { url: z.string().url().describe("The fully qualified URL of the web page to search for link patterns. Must be a valid HTTP or HTTPS URL."), pattern: z.string().min(1).describe("A JavaScript-compatible regular expression pattern (without enclosing slashes or flags) used to test against the absolute URLs of the links found on the page. Example: 'product\\/\\d+' to find product links."), };
- src/tools/findPatterns.ts:74-79 (registration)MCP server.tool() registration call for the 'find-patterns' tool using name, desc, params schema, and processRequest handler.server.tool( TOOL_NAME, TOOL_DESCRIPTION, TOOL_PARAMS, processRequest );
- src/tools/index.ts:10-32 (registration)Top-level tool registration in registerTools(): imports and invokes findPatternsTool(server). Note: startLine adjusted to include import.import { findPatternsTool } from "./findPatterns.js"; import { generateSitemapTool } from "./generateSitemapTool.js"; /** * Registers all available tools with the MCP server instance. * This function acts as the central point for tool registration. * * @param server - The McpServer instance to register tools with. */ export function registerTools(server: McpServer): void { logger.info("Registering tools..."); // Get config manager instance if needed to pass specific configs to tools // const configManager = ConfigurationManager.getInstance(); // Register each tool // Pass specific config if the tool/service requires it, e.g.: // checkLinksTool(server, configManager.getCheckLinksConfig()); checkLinksTool(server); crawlSiteTool(server); extractLinksTool(server); fetchPageTool(server); findPatternsTool(server);
- Core helper method in FindPatternsService that fetches HTML, parses links with cheerio, resolves absolute URLs, tests regex pattern on URLs, collects matching LinkResult objects.public async findLinksByPattern(pageUrl: string, pattern: string): Promise<LinkResult[]> { // Basic validation if (!pageUrl || typeof pageUrl !== 'string') { throw new ValidationError('Invalid input: pageUrl string is required.'); } if (!pattern || typeof pattern !== 'string') { throw new ValidationError('Invalid input: pattern string is required.'); } let regex: RegExp; try { regex = new RegExp(pattern); // Compile the regex pattern } catch (e) { logger.error(`Invalid regex pattern provided: ${pattern}`, { error: e instanceof Error ? e.message : String(e) }); throw new ValidationError(`Invalid regex pattern: ${pattern}. Error: ${e instanceof Error ? e.message : String(e)}`); } logger.info(`Starting pattern search on: ${pageUrl}`, { pattern }); const matches: LinkResult[] = []; try { const { $ } = await fetchHtml(pageUrl); logger.debug(`Successfully fetched HTML for ${pageUrl}`); const linkElements = $('a[href]').toArray(); logger.debug(`Found ${linkElements.length} anchor elements on ${pageUrl}`); for (const element of linkElements) { const link = $(element); const href = link.attr('href'); const text = link.text().trim() || '[No text]'; // Basic filtering if (!href || href.startsWith('#') || href.startsWith('mailto:') || href.startsWith('tel:')) { continue; } let absoluteUrl: string; try { absoluteUrl = new URL(href, pageUrl).toString(); } catch (e) { logger.warn(`Could not resolve href '${href}' on page ${pageUrl}`, { error: e instanceof Error ? e.message : String(e) }); continue; // Skip invalid URLs } // Test the absolute URL against the regex if (regex.test(absoluteUrl)) { logger.debug(`Pattern match found: ${absoluteUrl}`); matches.push({ url: absoluteUrl, text: text }); } } } catch (fetchError) { logger.error(`Failed to fetch or process page ${pageUrl} for pattern finding`, { error: fetchError instanceof Error ? fetchError.message : String(fetchError) }); throw new ServiceError(`Failed to fetch or process page ${pageUrl}: ${fetchError instanceof Error ? fetchError.message : String(fetchError)}`, fetchError); } logger.info(`Finished pattern search for ${pageUrl}. Found ${matches.length} matching links.`); return matches; }