Skip to main content
Glama
wlmwwx

Jina AI Remote MCP Server

by wlmwwx

guess_datetime_url

Extract the last updated or published datetime from web pages by analyzing HTTP headers, HTML metadata, Schema.org data, visible dates, and multiple other sources to provide accurate timestamps with confidence scores.

Instructions

Guess the last updated or published datetime of a web page. This tool examines HTTP headers, HTML metadata, Schema.org data, visible dates, JavaScript timestamps, HTML comments, Git information, RSS/Atom feeds, sitemaps, and international date formats to provide the most accurate update time with confidence scores. Returns the best guess timestamp and confidence level.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
urlYesThe complete HTTP/HTTPS URL of the webpage to guess datetime information

Implementation Reference

  • Core tool handler that fetches webpage, extracts datetime indicators from multiple sources (headers, meta, Schema.org, content, JS, feeds, Git), determines best guess using heuristics, returns timestamp and confidence.
    export async function guessDatetimeFromUrl(url: string): Promise<{
        bestGuess: string | null;
        confidence: number;
    }> {
        try {
            // Fetch the target webpage
            const response = await fetch(url);
    
            if (!response.ok) {
                throw new Error(`HTTP ${response.status}: ${response.statusText}`);
            }
    
            const text = await response.text();
    
            // Extract all possible time indicators
            const updateTimes = await extractAllTimeIndicators(response, text, url);
    
            // Advanced heuristic-based determination of the "true" update time
            const bestGuess = determineBestUpdateTime(updateTimes);
    
            // Result with confidence score
            const result = {
                bestGuess: bestGuess.timestamp,
                confidence: bestGuess.confidence
            };
    
            return result;
        } catch (error) {
            throw new Error(`Failed to guess datetime from URL: ${error instanceof Error ? error.message : String(error)}`);
        }
    }
  • MCP server.tool registration for 'guess_datetime_url', including description, input schema, and thin wrapper handler delegating to core implementation.
    server.tool(
    	"guess_datetime_url",
    	"Guess the last updated or published datetime of a web page. This tool examines HTTP headers, HTML metadata, Schema.org data, visible dates, JavaScript timestamps, HTML comments, Git information, RSS/Atom feeds, sitemaps, and international date formats to provide the most accurate update time with confidence scores. Returns the best guess timestamp and confidence level.",
    	{
    		url: z.string().url().describe("The complete HTTP/HTTPS URL of the webpage to guess datetime information")
    	},
    	async ({ url }: { url: string }) => {
    		try {
    			// Import the utility function
    			const { guessDatetimeFromUrl } = await import("../utils/guess-datetime.js");
    
    			// Analyze the URL for datetime information
    			const result = await guessDatetimeFromUrl(url);
    
    			return {
    				content: [{ type: "text" as const, text: yamlStringify(result) }],
    			};
    		} catch (error) {
    			return createErrorResponse(`Error: ${error instanceof Error ? error.message : String(error)}`);
    		}
    	},
    );
  • Zod input schema validation for the tool: requires a valid URL string.
    	url: z.string().url().describe("The complete HTTP/HTTPS URL of the webpage to guess datetime information")
    },
  • Core helper function for parsing diverse international date formats from strings found in HTML, meta, comments, etc.
    function parseDate(dateStr: string): Date | null {
        if (!dateStr) return null;
    
        // Clean up the string (remove extra spaces, normalize separators)
        let cleanStr = dateStr.trim()
            .replace(/\s+/g, ' ')
            .replace(/-(\d{2}:)/, ' $1'); // Fix formats like 2025-03-05-21:25:00
    
        // Try direct parsing first
        const date = new Date(cleanStr);
        if (!isNaN(date.getTime())) return date;
    
        // Try parsing ISO-like formats with variations
        const isoPattern = /(\d{4})[-\/](\d{1,2})[-\/](\d{1,2})(?:[T\s-](\d{1,2})[:\.](\d{1,2})(?:[:\.](\d{1,2}))?)?/;
        const isoMatch = cleanStr.match(isoPattern);
        if (isoMatch) {
            const year = parseInt(isoMatch[1]);
            const month = parseInt(isoMatch[2]) - 1; // JS months are 0-indexed
            const day = parseInt(isoMatch[3]);
            const hour = isoMatch[4] ? parseInt(isoMatch[4]) : 0;
            const minute = isoMatch[5] ? parseInt(isoMatch[5]) : 0;
            const second = isoMatch[6] ? parseInt(isoMatch[6]) : 0;
    
            const newDate = new Date(year, month, day, hour, minute, second);
            if (!isNaN(newDate.getTime())) return newDate;
        }
    
        // Try MM/DD/YYYY and DD/MM/YYYY formats
        const slashPattern = /(\d{1,2})[\/\-\.](\d{1,2})[\/\-\.](\d{4})/;
        const slashMatch = cleanStr.match(slashPattern);
        if (slashMatch) {
            // Try both MM/DD/YYYY and DD/MM/YYYY interpretations
            const parts = [parseInt(slashMatch[1]), parseInt(slashMatch[2]), parseInt(slashMatch[3])];
    
            // MM/DD/YYYY attempt
            const usDate = new Date(parts[2], parts[0] - 1, parts[1]);
            if (!isNaN(usDate.getTime()) && usDate.getMonth() === parts[0] - 1 && usDate.getDate() === parts[1]) {
                return usDate;
            }
    
            // DD/MM/YYYY attempt
            const euDate = new Date(parts[2], parts[1] - 1, parts[0]);
            if (!isNaN(euDate.getTime()) && euDate.getMonth() === parts[1] - 1 && euDate.getDate() === parts[0]) {
                return euDate;
            }
        }
    
        // Try month name patterns
        const monthNamePattern = /([A-Za-z]+)\s+(\d{1,2})(?:st|nd|rd|th)?,?\s+(\d{4})/i;
        const monthMatch = cleanStr.match(monthNamePattern);
        if (monthMatch) {
            const newDate = new Date(`${monthMatch[1]} ${monthMatch[2]}, ${monthMatch[3]}`);
            if (!isNaN(newDate.getTime())) return newDate;
        }
    
        // Try international date formats
        // Chinese: YYYY年MM月DD日
        const chinesePattern = /(\d{4})年(\d{1,2})月(\d{1,2})日/;
        const chineseMatch = cleanStr.match(chinesePattern);
        if (chineseMatch) {
            const year = parseInt(chineseMatch[1]);
            const month = parseInt(chineseMatch[2]) - 1;
            const day = parseInt(chineseMatch[3]);
            const newDate = new Date(year, month, day);
            if (!isNaN(newDate.getTime())) return newDate;
        }
    
        // Japanese: YYYY年MM月DD日
        const japanesePattern = /(\d{4})年(\d{1,2})月(\d{1,2})日/;
        const japaneseMatch = cleanStr.match(japanesePattern);
        if (japaneseMatch) {
            const year = parseInt(japaneseMatch[1]);
            const month = parseInt(japaneseMatch[2]) - 1;
            const day = parseInt(japaneseMatch[3]);
            const newDate = new Date(year, month, day);
            if (!isNaN(newDate.getTime())) return newDate;
        }
    
        // European: DD.MM.YYYY
        const europeanPattern = /(\d{1,2})\.(\d{1,2})\.(\d{4})/;
        const europeanMatch = cleanStr.match(europeanPattern);
        if (europeanMatch) {
            const day = parseInt(europeanMatch[1]);
            const month = parseInt(europeanMatch[2]) - 1;
            const year = parseInt(europeanMatch[3]);
            const newDate = new Date(year, month, day);
            if (!isNaN(newDate.getTime())) return newDate;
        }
    
        // Korean: YYYY-MM-DD
        const koreanPattern = /(\d{4})-(\d{1,2})-(\d{1,2})/;
        const koreanMatch = cleanStr.match(koreanPattern);
        if (koreanMatch) {
            const year = parseInt(koreanMatch[1]);
            const month = parseInt(koreanMatch[2]) - 1;
            const day = parseInt(koreanMatch[3]);
            const newDate = new Date(year, month, day);
            if (!isNaN(newDate.getTime())) return newDate;
        }
    
        // Try Unix timestamps (seconds or milliseconds)
        if (/^\d+$/.test(cleanStr)) {
            const timestamp = parseInt(cleanStr);
            // If the number is too small to be a millisecond timestamp but could be seconds
            const date = new Date(timestamp > 9999999999 ? timestamp : timestamp * 1000);
            if (!isNaN(date.getTime()) && date.getFullYear() > 1970 && date.getFullYear() < 2100) {
                return date;
            }
        }
    
        return null;
    }
  • Key decision helper: selects best datetime from all extracted candidates using prioritized heuristics and recency sorting.
    function determineBestUpdateTime(updateTimes: any) {
        // First check for meta tags that explicitly indicate last modified time
        if (updateTimes.metaTags && updateTimes.metaTags.lastModified) {
            return {
                timestamp: updateTimes.metaTags.lastModified,
                confidence: 95,
                reasoning: ["Explicit lastmodifiedtime meta tag"]
            };
        }
    
        // Check other meta tags related to publication/modification
        if (updateTimes.metaTags) {
            if (updateTimes.metaTags.articleModified) {
                return {
                    timestamp: updateTimes.metaTags.articleModified,
                    confidence: 90,
                    reasoning: ["Article modified time meta tag"]
                };
            }
    
            if (updateTimes.metaTags.publishedDate) {
                return {
                    timestamp: updateTimes.metaTags.publishedDate,
                    confidence: 85,
                    reasoning: ["Published date meta tag"]
                };
            }
    
            // Check for new high-value meta tags
            if (updateTimes.metaTags.ogUpdatedTime) {
                return {
                    timestamp: updateTimes.metaTags.ogUpdatedTime,
                    confidence: 88,
                    reasoning: ["Open Graph updated time meta tag"]
                };
            }
    
            if (updateTimes.metaTags.lastmod) {
                return {
                    timestamp: updateTimes.metaTags.lastmod,
                    confidence: 87,
                    reasoning: ["Last modified meta tag"]
                };
            }
    
            if (updateTimes.metaTags.generated) {
                return {
                    timestamp: updateTimes.metaTags.generated,
                    confidence: 85,
                    reasoning: ["Page generated meta tag"]
                };
            }
    
            if (updateTimes.metaTags.build) {
                return {
                    timestamp: updateTimes.metaTags.build,
                    confidence: 83,
                    reasoning: ["Build timestamp meta tag"]
                };
            }
        }
    
        // Check feed timestamps (RSS, Atom, Sitemap) - often very reliable
        if (updateTimes.feedTimestamps && updateTimes.feedTimestamps.length > 0) {
            // Filter for high priority feed timestamps
            const highPriorityFeeds = updateTimes.feedTimestamps
                .filter((stamp: any) => stamp.priority === 'high')
                .map((stamp: any) => ({ date: new Date(stamp.date), type: stamp.type, context: stamp.context }));
    
            if (highPriorityFeeds.length > 0) {
                // Sort by recency
                highPriorityFeeds.sort((a: any, b: any) => b.date.getTime() - a.date.getTime());
    
                return {
                    timestamp: highPriorityFeeds[0].date.toISOString(),
                    confidence: 92,
                    reasoning: ["Feed timestamp", `Type: ${highPriorityFeeds[0].type}`, `Context: ${highPriorityFeeds[0].context}`]
                };
            }
    
            // If no high priority feeds, use most recent feed timestamp
            const allFeedDates = updateTimes.feedTimestamps
                .map((stamp: any) => ({ date: new Date(stamp.date), type: stamp.type, context: stamp.context }));
    
            allFeedDates.sort((a: any, b: any) => b.date.getTime() - a.date.getTime());
    
            return {
                timestamp: allFeedDates[0].date.toISOString(),
                confidence: 85,
                reasoning: ["Feed timestamp", `Type: ${allFeedDates[0].type}`, `Context: ${allFeedDates[0].context}`]
            };
        }
    
        // Check visible dates with high priority markers
        if (updateTimes.visibleDates && updateTimes.visibleDates.length > 0) {
            // Look for dates that appear to be part of lastmodified content
            const contentDates = updateTimes.visibleDates.filter((d: any) => {
                const ctx = d.context.toLowerCase();
                return ctx.includes('lastmodified') ||
                    ctx.includes('last modified') ||
                    ctx.includes('updated') ||
                    ctx.includes('修改') ||  // Chinese for "modified"
                    ctx.includes('更新');    // Chinese for "updated" 
            });
    
            if (contentDates.length > 0) {
                // Sort by recency
                const dates = contentDates.map((d: any) => new Date(d.date));
                dates.sort((a: Date, b: Date) => {
                    if (!a || !b) return 0;
                    return b.getTime() - a.getTime();
                });
    
                return {
                    timestamp: dates[0].toISOString(),
                    confidence: 92,
                    reasoning: ["Content explicitly marked as modified/updated"]
                };
            }
    
            // Next check for dates that appear in common date display elements
            const displayDateElements = updateTimes.visibleDates.filter((d: any) => {
                const ctx = d.context.toLowerCase();
                return ctx.includes('class="date') ||
                    ctx.includes('class="time') ||
                    ctx.includes('class="pubdate') ||
                    ctx.includes('class="published') ||
                    ctx.includes('pages-date') ||
                    ctx.includes('pub-date');
            });
    
            if (displayDateElements.length > 0) {
                const dates = displayDateElements.map((d: any) => new Date(d.date));
                dates.sort((a: Date, b: Date) => b.getTime() - a.getTime());
    
                return {
                    timestamp: dates[0].toISOString(),
                    confidence: 88,
                    reasoning: ["Date from primary content display element"]
                };
            }
        }
    
        // Check for Schema.org timestamps
        if (updateTimes.schemaOrgTimestamps && updateTimes.schemaOrgTimestamps.length > 0) {
            // Filter for high priority fields: dateModified and dateUpdated
            const highPriorityDates = updateTimes.schemaOrgTimestamps
                .filter((stamp: any) => stamp.priority === 'high')
                .map((stamp: any) => ({ date: new Date(stamp.date), field: stamp.field, context: stamp.context }));
    
            if (highPriorityDates.length > 0) {
                // Sort by recency
                highPriorityDates.sort((a: any, b: any) => b.date.getTime() - a.date.getTime());
    
                return {
                    timestamp: highPriorityDates[0].date.toISOString(),
                    confidence: 85,
                    reasoning: ["Schema.org structured data", `Field: ${highPriorityDates[0].field}`, `Context: ${highPriorityDates[0].context}`]
                };
            }
    
            // If no high priority fields, use most recent Schema.org date
            const allSchemaDates = updateTimes.schemaOrgTimestamps
                .map((stamp: any) => ({ date: new Date(stamp.date), field: stamp.field, context: stamp.context }));
    
            allSchemaDates.sort((a: any, b: any) => b.date.getTime() - a.date.getTime());
    
            return {
                timestamp: allSchemaDates[0].date.toISOString(),
                confidence: 75,
                reasoning: ["Schema.org structured data", `Field: ${allSchemaDates[0].field}`, `Context: ${allSchemaDates[0].context}`]
            };
        }
    
        // Check Git info (often very reliable)
        if (updateTimes.gitInfo && updateTimes.gitInfo.gitDate) {
            return {
                timestamp: updateTimes.gitInfo.gitDate,
                confidence: 90,
                reasoning: ["Git commit information", updateTimes.gitInfo.gitHash ? `Git hash: ${updateTimes.gitInfo.gitHash}` : ""]
            };
        } else if (updateTimes.gitInfo && updateTimes.gitInfo.deployDate) {
            return {
                timestamp: updateTimes.gitInfo.deployDate,
                confidence: 88,
                reasoning: ["Git deployment timestamp"]
            };
        }
    
        // JSON-LD structured data is also quite reliable
        if (updateTimes.jsTimestamps && updateTimes.jsTimestamps.length > 0) {
            const jsonLdDates = updateTimes.jsTimestamps
                .filter((stamp: any) => stamp.type === 'jsonLd')
                .map((stamp: any) => ({
                    date: new Date(stamp.date),
                    field: stamp.field,
                    priority: stamp.priority
                }));
    
            if (jsonLdDates.length > 0) {
                // Sort by priority and recency
                jsonLdDates.sort((a: any, b: any) => {
                    if (a.priority === 'high' && b.priority !== 'high') return -1;
                    if (a.priority !== 'high' && b.priority === 'high') return 1;
                    return b.date.getTime() - a.date.getTime();
                });
    
                return {
                    timestamp: jsonLdDates[0].date.toISOString(),
                    confidence: jsonLdDates[0].priority === 'high' ? 80 : 65,
                    reasoning: [`JSON-LD structured data (${jsonLdDates[0].field})`]
                };
            }
        }
    
        // If we have a page generation time meta tag, it's a decent indicator
        if (updateTimes.metaTags && updateTimes.metaTags.pageGenerated) {
            return {
                timestamp: updateTimes.metaTags.pageGenerated,
                confidence: 75,
                reasoning: ["Page generation time meta tag"]
            };
        }
    
        // Process visible dates that don't have explicit modification indicators
        if (updateTimes.visibleDates && updateTimes.visibleDates.length > 0) {
            // Get all dates and sort by recency
            const allDates = updateTimes.visibleDates.map((d: any) => ({
                date: new Date(d.date),
                context: d.context
            }));
    
            allDates.sort((a: any, b: any) => b.date.getTime() - a.date.getTime());
    
            return {
                timestamp: allDates[0].date.toISOString(),
                confidence: 70,
                reasoning: ["Most recent date found in page content", `Context: "${allDates[0].context}"`]
            };
        }
    
        // Try HTML comments
        if (updateTimes.htmlComments && updateTimes.htmlComments.length > 0) {
            const commentDates = updateTimes.htmlComments.map((c: any) => ({
                date: new Date(c.date),
                context: c.context
            }));
    
            commentDates.sort((a: any, b: any) => b.date.getTime() - a.date.getTime());
    
            return {
                timestamp: commentDates[0].date.toISOString(),
                confidence: 60,
                reasoning: ["Timestamp from HTML comment", `Context: "${commentDates[0].context}"`]
            };
        }
    
        // Try JavaScript timestamps
        if (updateTimes.jsTimestamps && updateTimes.jsTimestamps.length > 0) {
            const jsDates = updateTimes.jsTimestamps
                .filter((stamp: any) => stamp.type !== 'jsonLd')
                .map((stamp: any) => ({
                    date: new Date(stamp.date),
                    context: stamp.context,
                    type: stamp.type
                }));
    
            if (jsDates.length > 0) {
                // Sort by recency
                jsDates.sort((a: any, b: any) => b.date.getTime() - a.date.getTime());
    
                return {
                    timestamp: jsDates[0].date.toISOString(),
                    confidence: 60,
                    reasoning: ["JavaScript timestamp found", `Context: "${jsDates[0].context}"`]
                };
            }
        }
    
        // Use HTTP Last-Modified even if it matches server time, but with lower confidence
        if (updateTimes.lastModified) {
            const lastModDate = new Date(updateTimes.lastModified);
            if (!isNaN(lastModDate.getTime())) {
                // Check if Last-Modified differs significantly from server time
                if (updateTimes.date) {
                    const serverDate = new Date(updateTimes.date);
                    const timeDiff = Math.abs(lastModDate.getTime() - serverDate.getTime());
    
                    if (timeDiff > 60000) { // More than 1 minute difference
                        return {
                            timestamp: lastModDate.toISOString(),
                            confidence: 75,
                            reasoning: ["HTTP Last-Modified header differs significantly from server time", `Difference: ${Math.round(timeDiff / 1000 / 60)} minutes`]
                        };
                    } else if (timeDiff > 1000) { // More than 1 second difference
                        return {
                            timestamp: lastModDate.toISOString(),
                            confidence: 65,
                            reasoning: ["HTTP Last-Modified header differs from server time", `Difference: ${Math.round(timeDiff / 1000)} seconds`]
                        };
                    } else {
                        return {
                            timestamp: lastModDate.toISOString(),
                            confidence: 40,
                            reasoning: ["HTTP Last-Modified header (may be server time)"]
                        };
                    }
                } else {
                    return {
                        timestamp: lastModDate.toISOString(),
                        confidence: 60,
                        reasoning: ["HTTP Last-Modified header"]
                    };
                }
            }
        }
    
        // If all else fails, use the server date with very low confidence
        if (updateTimes.date) {
            return {
                timestamp: new Date(updateTimes.date).toISOString(),
                confidence: 10,
                reasoning: ["No update time found", "Using server date as fallback"]
            };
        }
    
        // Absolute fallback: unknown
        return {
            timestamp: null,
            confidence: 0,
            reasoning: ["No reliable update time indicators found"]
        };
    }

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/wlmwwx/jina-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server