Skip to main content
Glama
jina-ai

Jina AI Remote MCP Server

Official
by jina-ai

guess_datetime_url

Determine the last updated or published date of a web page by analyzing HTTP headers, HTML metadata, Schema.org data, visible dates, and other sources to provide an accurate timestamp with confidence scores.

Instructions

Guess the last updated or published datetime of a web page. This tool examines HTTP headers, HTML metadata, Schema.org data, visible dates, JavaScript timestamps, HTML comments, Git information, RSS/Atom feeds, sitemaps, and international date formats to provide the most accurate update time with confidence scores. Returns the best guess timestamp and confidence level.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
urlYesThe complete HTTP/HTTPS URL of the webpage to guess datetime information

Implementation Reference

  • Main handler function that fetches the given URL, extracts datetime indicators from HTTP headers/meta, HTML content, Schema.org data, JS timestamps, comments, Git info, and RSS/Atom/sitemap feeds, then heuristically selects the best last update/publish datetime with confidence score.
    export async function guessDatetimeFromUrl(url: string): Promise<{
        bestGuess: string | null;
        confidence: number;
    }> {
        try {
            // Fetch the target webpage
            const response = await fetch(url);
    
            if (!response.ok) {
                throw new Error(`HTTP ${response.status}: ${response.statusText}`);
            }
    
            const text = await response.text();
    
            // Extract all possible time indicators
            const updateTimes = await extractAllTimeIndicators(response, text, url);
    
            // Advanced heuristic-based determination of the "true" update time
            const bestGuess = determineBestUpdateTime(updateTimes);
    
            // Result with confidence score
            const result = {
                bestGuess: bestGuess.timestamp,
                confidence: bestGuess.confidence
            };
    
            return result;
        } catch (error) {
            throw new Error(`Failed to guess datetime from URL: ${error instanceof Error ? error.message : String(error)}`);
        }
    }
  • MCP server.tool registration for 'guess_datetime_url', including full description, Zod input schema (url: string URL), and thin async handler that dynamically imports and calls the core guessDatetimeFromUrl utility.
    if (isToolEnabled("guess_datetime_url")) {
    	server.tool(
    		"guess_datetime_url",
    		"Guess the last updated or published datetime of a web page. This tool examines HTTP headers, HTML metadata, Schema.org data, visible dates, JavaScript timestamps, HTML comments, Git information, RSS/Atom feeds, sitemaps, and international date formats to provide the most accurate update time with confidence scores. Returns the best guess timestamp and confidence level.",
    		{
    			url: z.string().url().describe("The complete HTTP/HTTPS URL of the webpage to guess datetime information")
    		},
    		async ({ url }: { url: string }) => {
    			try {
    				// Import the utility function
    				const { guessDatetimeFromUrl } = await import("../utils/guess-datetime.js");
    
    				// Analyze the URL for datetime information
    				const result = await guessDatetimeFromUrl(url);
    
    				return {
    					content: [{ type: "text" as const, text: yamlStringify(result) }],
    				};
    			} catch (error) {
    				return createErrorResponse(`Error: ${error instanceof Error ? error.message : String(error)}`);
    			}
    		},
    	);
    }
  • Zod schema definition for the tool input: a single required 'url' parameter validated as a URL string.
    url: z.string().url().describe("The complete HTTP/HTTPS URL of the webpage to guess datetime information")
  • Key helper function that applies prioritized heuristics to all extracted datetime indicators (meta, schema, feeds, etc.) to determine the single best update timestamp with confidence score and reasoning.
    function determineBestUpdateTime(updateTimes: any) {
        // First check for meta tags that explicitly indicate last modified time
        if (updateTimes.metaTags && updateTimes.metaTags.lastModified) {
            return {
                timestamp: updateTimes.metaTags.lastModified,
                confidence: 95,
                reasoning: ["Explicit lastmodifiedtime meta tag"]
            };
        }
    
        // Check other meta tags related to publication/modification
        if (updateTimes.metaTags) {
            if (updateTimes.metaTags.articleModified) {
                return {
                    timestamp: updateTimes.metaTags.articleModified,
                    confidence: 90,
                    reasoning: ["Article modified time meta tag"]
                };
            }
    
            if (updateTimes.metaTags.publishedDate) {
                return {
                    timestamp: updateTimes.metaTags.publishedDate,
                    confidence: 85,
                    reasoning: ["Published date meta tag"]
                };
            }
    
            // Check for new high-value meta tags
            if (updateTimes.metaTags.ogUpdatedTime) {
                return {
                    timestamp: updateTimes.metaTags.ogUpdatedTime,
                    confidence: 88,
                    reasoning: ["Open Graph updated time meta tag"]
                };
            }
    
            if (updateTimes.metaTags.lastmod) {
                return {
                    timestamp: updateTimes.metaTags.lastmod,
                    confidence: 87,
                    reasoning: ["Last modified meta tag"]
                };
            }
    
            if (updateTimes.metaTags.generated) {
                return {
                    timestamp: updateTimes.metaTags.generated,
                    confidence: 85,
                    reasoning: ["Page generated meta tag"]
                };
            }
    
            if (updateTimes.metaTags.build) {
                return {
                    timestamp: updateTimes.metaTags.build,
                    confidence: 83,
                    reasoning: ["Build timestamp meta tag"]
                };
            }
        }
    
        // Check feed timestamps (RSS, Atom, Sitemap) - often very reliable
        if (updateTimes.feedTimestamps && updateTimes.feedTimestamps.length > 0) {
            // Filter for high priority feed timestamps
            const highPriorityFeeds = updateTimes.feedTimestamps
                .filter((stamp: any) => stamp.priority === 'high')
                .map((stamp: any) => ({ date: new Date(stamp.date), type: stamp.type, context: stamp.context }));
    
            if (highPriorityFeeds.length > 0) {
                // Sort by recency
                highPriorityFeeds.sort((a: any, b: any) => b.date.getTime() - a.date.getTime());
    
                return {
                    timestamp: highPriorityFeeds[0].date.toISOString(),
                    confidence: 92,
                    reasoning: ["Feed timestamp", `Type: ${highPriorityFeeds[0].type}`, `Context: ${highPriorityFeeds[0].context}`]
                };
            }
    
            // If no high priority feeds, use most recent feed timestamp
            const allFeedDates = updateTimes.feedTimestamps
                .map((stamp: any) => ({ date: new Date(stamp.date), type: stamp.type, context: stamp.context }));
    
            allFeedDates.sort((a: any, b: any) => b.date.getTime() - a.date.getTime());
    
            return {
                timestamp: allFeedDates[0].date.toISOString(),
                confidence: 85,
                reasoning: ["Feed timestamp", `Type: ${allFeedDates[0].type}`, `Context: ${allFeedDates[0].context}`]
            };
        }
    
        // Check visible dates with high priority markers
        if (updateTimes.visibleDates && updateTimes.visibleDates.length > 0) {
            // Look for dates that appear to be part of lastmodified content
            const contentDates = updateTimes.visibleDates.filter((d: any) => {
                const ctx = d.context.toLowerCase();
                return ctx.includes('lastmodified') ||
                    ctx.includes('last modified') ||
                    ctx.includes('updated') ||
                    ctx.includes('修改') ||  // Chinese for "modified"
                    ctx.includes('更新');    // Chinese for "updated" 
            });
    
            if (contentDates.length > 0) {
                // Sort by recency
                const dates = contentDates.map((d: any) => new Date(d.date));
                dates.sort((a: Date, b: Date) => {
                    if (!a || !b) return 0;
                    return b.getTime() - a.getTime();
                });
    
                return {
                    timestamp: dates[0].toISOString(),
                    confidence: 92,
                    reasoning: ["Content explicitly marked as modified/updated"]
                };
            }
    
            // Next check for dates that appear in common date display elements
            const displayDateElements = updateTimes.visibleDates.filter((d: any) => {
                const ctx = d.context.toLowerCase();
                return ctx.includes('class="date') ||
                    ctx.includes('class="time') ||
                    ctx.includes('class="pubdate') ||
                    ctx.includes('class="published') ||
                    ctx.includes('pages-date') ||
                    ctx.includes('pub-date');
            });
    
            if (displayDateElements.length > 0) {
                const dates = displayDateElements.map((d: any) => new Date(d.date));
                dates.sort((a: Date, b: Date) => b.getTime() - a.getTime());
    
                return {
                    timestamp: dates[0].toISOString(),
                    confidence: 88,
                    reasoning: ["Date from primary content display element"]
                };
            }
        }
    
        // Check for Schema.org timestamps
        if (updateTimes.schemaOrgTimestamps && updateTimes.schemaOrgTimestamps.length > 0) {
            // Filter for high priority fields: dateModified and dateUpdated
            const highPriorityDates = updateTimes.schemaOrgTimestamps
                .filter((stamp: any) => stamp.priority === 'high')
                .map((stamp: any) => ({ date: new Date(stamp.date), field: stamp.field, context: stamp.context }));
    
            if (highPriorityDates.length > 0) {
                // Sort by recency
                highPriorityDates.sort((a: any, b: any) => b.date.getTime() - a.date.getTime());
    
                return {
                    timestamp: highPriorityDates[0].date.toISOString(),
                    confidence: 85,
                    reasoning: ["Schema.org structured data", `Field: ${highPriorityDates[0].field}`, `Context: ${highPriorityDates[0].context}`]
                };
            }
    
            // If no high priority fields, use most recent Schema.org date
            const allSchemaDates = updateTimes.schemaOrgTimestamps
                .map((stamp: any) => ({ date: new Date(stamp.date), field: stamp.field, context: stamp.context }));
    
            allSchemaDates.sort((a: any, b: any) => b.date.getTime() - a.date.getTime());
    
            return {
                timestamp: allSchemaDates[0].date.toISOString(),
                confidence: 75,
                reasoning: ["Schema.org structured data", `Field: ${allSchemaDates[0].field}`, `Context: ${allSchemaDates[0].context}`]
            };
        }
    
        // Check Git info (often very reliable)
        if (updateTimes.gitInfo && updateTimes.gitInfo.gitDate) {
            return {
                timestamp: updateTimes.gitInfo.gitDate,
                confidence: 90,
                reasoning: ["Git commit information", updateTimes.gitInfo.gitHash ? `Git hash: ${updateTimes.gitInfo.gitHash}` : ""]
            };
        } else if (updateTimes.gitInfo && updateTimes.gitInfo.deployDate) {
            return {
                timestamp: updateTimes.gitInfo.deployDate,
                confidence: 88,
                reasoning: ["Git deployment timestamp"]
            };
        }
    
        // JSON-LD structured data is also quite reliable
        if (updateTimes.jsTimestamps && updateTimes.jsTimestamps.length > 0) {
            const jsonLdDates = updateTimes.jsTimestamps
                .filter((stamp: any) => stamp.type === 'jsonLd')
                .map((stamp: any) => ({
                    date: new Date(stamp.date),
                    field: stamp.field,
                    priority: stamp.priority
                }));
    
            if (jsonLdDates.length > 0) {
                // Sort by priority and recency
                jsonLdDates.sort((a: any, b: any) => {
                    if (a.priority === 'high' && b.priority !== 'high') return -1;
                    if (a.priority !== 'high' && b.priority === 'high') return 1;
                    return b.date.getTime() - a.date.getTime();
                });
    
                return {
                    timestamp: jsonLdDates[0].date.toISOString(),
                    confidence: jsonLdDates[0].priority === 'high' ? 80 : 65,
                    reasoning: [`JSON-LD structured data (${jsonLdDates[0].field})`]
                };
            }
        }
    
        // If we have a page generation time meta tag, it's a decent indicator
        if (updateTimes.metaTags && updateTimes.metaTags.pageGenerated) {
            return {
                timestamp: updateTimes.metaTags.pageGenerated,
                confidence: 75,
                reasoning: ["Page generation time meta tag"]
            };
        }
    
        // Process visible dates that don't have explicit modification indicators
        if (updateTimes.visibleDates && updateTimes.visibleDates.length > 0) {
            // Get all dates and sort by recency
            const allDates = updateTimes.visibleDates.map((d: any) => ({
                date: new Date(d.date),
                context: d.context
            }));
    
            allDates.sort((a: any, b: any) => b.date.getTime() - a.date.getTime());
    
            return {
                timestamp: allDates[0].date.toISOString(),
                confidence: 70,
                reasoning: ["Most recent date found in page content", `Context: "${allDates[0].context}"`]
            };
        }
    
        // Try HTML comments
        if (updateTimes.htmlComments && updateTimes.htmlComments.length > 0) {
            const commentDates = updateTimes.htmlComments.map((c: any) => ({
                date: new Date(c.date),
                context: c.context
            }));
    
            commentDates.sort((a: any, b: any) => b.date.getTime() - a.date.getTime());
    
            return {
                timestamp: commentDates[0].date.toISOString(),
                confidence: 60,
                reasoning: ["Timestamp from HTML comment", `Context: "${commentDates[0].context}"`]
            };
        }
    
        // Try JavaScript timestamps
        if (updateTimes.jsTimestamps && updateTimes.jsTimestamps.length > 0) {
            const jsDates = updateTimes.jsTimestamps
                .filter((stamp: any) => stamp.type !== 'jsonLd')
                .map((stamp: any) => ({
                    date: new Date(stamp.date),
                    context: stamp.context,
                    type: stamp.type
                }));
    
            if (jsDates.length > 0) {
                // Sort by recency
                jsDates.sort((a: any, b: any) => b.date.getTime() - a.date.getTime());
    
                return {
                    timestamp: jsDates[0].date.toISOString(),
                    confidence: 60,
                    reasoning: ["JavaScript timestamp found", `Context: "${jsDates[0].context}"`]
                };
            }
        }
    
        // Use HTTP Last-Modified even if it matches server time, but with lower confidence
        if (updateTimes.lastModified) {
            const lastModDate = new Date(updateTimes.lastModified);
            if (!isNaN(lastModDate.getTime())) {
                // Check if Last-Modified differs significantly from server time
                if (updateTimes.date) {
                    const serverDate = new Date(updateTimes.date);
                    const timeDiff = Math.abs(lastModDate.getTime() - serverDate.getTime());
    
                    if (timeDiff > 60000) { // More than 1 minute difference
                        return {
                            timestamp: lastModDate.toISOString(),
                            confidence: 75,
                            reasoning: ["HTTP Last-Modified header differs significantly from server time", `Difference: ${Math.round(timeDiff / 1000 / 60)} minutes`]
                        };
                    } else if (timeDiff > 1000) { // More than 1 second difference
                        return {
                            timestamp: lastModDate.toISOString(),
                            confidence: 65,
                            reasoning: ["HTTP Last-Modified header differs from server time", `Difference: ${Math.round(timeDiff / 1000)} seconds`]
                        };
                    } else {
                        return {
                            timestamp: lastModDate.toISOString(),
                            confidence: 40,
                            reasoning: ["HTTP Last-Modified header (may be server time)"]
                        };
                    }
                } else {
                    return {
                        timestamp: lastModDate.toISOString(),
                        confidence: 60,
                        reasoning: ["HTTP Last-Modified header"]
                    };
                }
            }
        }
    
        // If all else fails, use the server date with very low confidence
        if (updateTimes.date) {
            return {
                timestamp: new Date(updateTimes.date).toISOString(),
                confidence: 10,
                reasoning: ["No update time found", "Using server date as fallback"]
            };
        }
    
        // Absolute fallback: unknown
        return {
            timestamp: null,
            confidence: 0,
            reasoning: ["No reliable update time indicators found"]
        };
    }
  • src/index.ts:100-100 (registration)
    Invocation of registerJinaTools in main MCP server creation, which conditionally registers 'guess_datetime_url' based on enabledTools filter.
    registerJinaTools(server, () => currentProps, enabledTools);

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/jina-ai/MCP'

If you have feedback or need assistance with the MCP directory API, please join our Discord server