batch_extract_timestamps
Extract webpage creation, modification, and publication timestamps from multiple URLs in batch using configurable options for timeout, user agent, redirects, and heuristic detection.
Instructions
Extract timestamps from multiple webpages in batch
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| config | No | Optional configuration for the extraction | |
| urls | Yes | Array of URLs to extract timestamps from |
Implementation Reference
- src/index.ts:159-209 (handler)Handler for the 'batch_extract_timestamps' tool: validates URLs array, creates TimestampExtractor instance, concurrently extracts timestamps from each URL using Promise.allSettled, handles errors, and returns JSON results.if (name === 'batch_extract_timestamps') { const { urls, config } = args as { urls: string[]; config?: { timeout?: number; userAgent?: string; followRedirects?: boolean; maxRedirects?: number; enableHeuristics?: boolean; }; }; if (!Array.isArray(urls) || urls.length === 0) { return { content: [ { type: 'text', text: 'Error: URLs array is required and must not be empty', }, ], isError: true, }; } const timestampExtractor = config ? new TimestampExtractor(config) : extractor; const results = await Promise.allSettled( urls.map(url => timestampExtractor.extractTimestamps(url)) ); const processedResults = results.map((result, index) => { if (result.status === 'fulfilled') { return result.value; } else { return { url: urls[index], sources: [], confidence: 'low' as const, errors: [`Failed to extract timestamps: ${result.reason}`], }; } }); return { content: [ { type: 'text', text: JSON.stringify(processedResults, null, 2), }, ], }; }
- src/index.ts:67-109 (registration)Registration of the 'batch_extract_timestamps' tool in the MCP tools list, defining name, description, and input schema.{ name: 'batch_extract_timestamps', description: 'Extract timestamps from multiple webpages in batch', inputSchema: { type: 'object', properties: { urls: { type: 'array', items: { type: 'string', }, description: 'Array of URLs to extract timestamps from', }, config: { type: 'object', description: 'Optional configuration for the extraction', properties: { timeout: { type: 'number', description: 'Request timeout in milliseconds (default: 10000)', }, userAgent: { type: 'string', description: 'User agent string to use for requests', }, followRedirects: { type: 'boolean', description: 'Whether to follow HTTP redirects (default: true)', }, maxRedirects: { type: 'number', description: 'Maximum number of redirects to follow (default: 5)', }, enableHeuristics: { type: 'boolean', description: 'Whether to enable heuristic timestamp detection (default: true)', }, }, }, }, required: ['urls'], }, },
- src/extractor.ts:19-56 (helper)Core helper method TimestampExtractor.extractTimestamps used by the batch handler to process individual URLs: fetches page, extracts timestamps from meta, headers, structured data, heuristics, and consolidates results.async extractTimestamps(url: string): Promise<TimestampResult> { const errors: string[] = []; let fetchResult: FetchResult; try { fetchResult = await this.fetchPage(url); } catch (error) { return { url, sources: [], confidence: 'low', errors: [`Failed to fetch page: ${error instanceof Error ? error.message : String(error)}`], }; } const $ = cheerio.load(fetchResult.html); const sources: TimestampSource[] = []; // Extract timestamps from various sources sources.push(...this.extractFromHtmlMeta($)); sources.push(...this.extractFromHttpHeaders(fetchResult.headers)); sources.push(...this.extractFromJsonLd($)); sources.push(...this.extractFromMicrodata($)); sources.push(...this.extractFromOpenGraph($)); sources.push(...this.extractFromTwitterCards($)); if (this.config.enableHeuristics) { sources.push(...this.extractFromHeuristics($)); } const result = this.consolidateTimestamps(url, sources); if (errors.length > 0) { result.errors = errors; } return result; }