batch_extract_timestamps
Extract creation, modification, and publication timestamps from multiple webpages simultaneously using HTML meta tags, HTTP headers, and structured data.
Instructions
Extract timestamps from multiple webpages in batch
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| urls | Yes | Array of URLs to extract timestamps from | |
| config | No | Optional configuration for the extraction |
Implementation Reference
- src/index.ts:159-209 (handler)Handler for the 'batch_extract_timestamps' tool. Parses arguments, validates URLs array, instantiates TimestampExtractor with optional config, concurrently extracts timestamps from each URL using Promise.allSettled, processes results including errors, and returns JSON-formatted array of results.
if (name === 'batch_extract_timestamps') { const { urls, config } = args as { urls: string[]; config?: { timeout?: number; userAgent?: string; followRedirects?: boolean; maxRedirects?: number; enableHeuristics?: boolean; }; }; if (!Array.isArray(urls) || urls.length === 0) { return { content: [ { type: 'text', text: 'Error: URLs array is required and must not be empty', }, ], isError: true, }; } const timestampExtractor = config ? new TimestampExtractor(config) : extractor; const results = await Promise.allSettled( urls.map(url => timestampExtractor.extractTimestamps(url)) ); const processedResults = results.map((result, index) => { if (result.status === 'fulfilled') { return result.value; } else { return { url: urls[index], sources: [], confidence: 'low' as const, errors: [`Failed to extract timestamps: ${result.reason}`], }; } }); return { content: [ { type: 'text', text: JSON.stringify(processedResults, null, 2), }, ], }; } - src/index.ts:67-109 (registration)Registration of the 'batch_extract_timestamps' tool in the tools array returned by ListToolsRequestHandler, including detailed input schema for batch processing.
{ name: 'batch_extract_timestamps', description: 'Extract timestamps from multiple webpages in batch', inputSchema: { type: 'object', properties: { urls: { type: 'array', items: { type: 'string', }, description: 'Array of URLs to extract timestamps from', }, config: { type: 'object', description: 'Optional configuration for the extraction', properties: { timeout: { type: 'number', description: 'Request timeout in milliseconds (default: 10000)', }, userAgent: { type: 'string', description: 'User agent string to use for requests', }, followRedirects: { type: 'boolean', description: 'Whether to follow HTTP redirects (default: true)', }, maxRedirects: { type: 'number', description: 'Maximum number of redirects to follow (default: 5)', }, enableHeuristics: { type: 'boolean', description: 'Whether to enable heuristic timestamp detection (default: true)', }, }, }, }, required: ['urls'], }, }, - src/index.ts:70-108 (schema)Input schema definition for the 'batch_extract_timestamps' tool, specifying required 'urls' array and optional 'config' object.
inputSchema: { type: 'object', properties: { urls: { type: 'array', items: { type: 'string', }, description: 'Array of URLs to extract timestamps from', }, config: { type: 'object', description: 'Optional configuration for the extraction', properties: { timeout: { type: 'number', description: 'Request timeout in milliseconds (default: 10000)', }, userAgent: { type: 'string', description: 'User agent string to use for requests', }, followRedirects: { type: 'boolean', description: 'Whether to follow HTTP redirects (default: true)', }, maxRedirects: { type: 'number', description: 'Maximum number of redirects to follow (default: 5)', }, enableHeuristics: { type: 'boolean', description: 'Whether to enable heuristic timestamp detection (default: true)', }, }, }, }, required: ['urls'], }, - src/extractor.ts:19-56 (helper)Helper method in TimestampExtractor class that implements the core logic for extracting timestamps from a single webpage, used by both 'extract_timestamps' and 'batch_extract_timestamps' tools.
async extractTimestamps(url: string): Promise<TimestampResult> { const errors: string[] = []; let fetchResult: FetchResult; try { fetchResult = await this.fetchPage(url); } catch (error) { return { url, sources: [], confidence: 'low', errors: [`Failed to fetch page: ${error instanceof Error ? error.message : String(error)}`], }; } const $ = cheerio.load(fetchResult.html); const sources: TimestampSource[] = []; // Extract timestamps from various sources sources.push(...this.extractFromHtmlMeta($)); sources.push(...this.extractFromHttpHeaders(fetchResult.headers)); sources.push(...this.extractFromJsonLd($)); sources.push(...this.extractFromMicrodata($)); sources.push(...this.extractFromOpenGraph($)); sources.push(...this.extractFromTwitterCards($)); if (this.config.enableHeuristics) { sources.push(...this.extractFromHeuristics($)); } const result = this.consolidateTimestamps(url, sources); if (errors.length > 0) { result.errors = errors; } return result; }