guess_datetime_url

Determine the last updated or published date of a web page by analyzing HTTP headers, HTML metadata, Schema.org data, visible dates, and other sources to provide an accurate timestamp with confidence scores.

Instructions

Guess the last updated or published datetime of a web page. This tool examines HTTP headers, HTML metadata, Schema.org data, visible dates, JavaScript timestamps, HTML comments, Git information, RSS/Atom feeds, sitemaps, and international date formats to provide the most accurate update time with confidence scores. Returns the best guess timestamp and confidence level.

Input Schema

TableJSON Schema

Name	Required	Description	Default
`url`	Yes	The complete HTTP/HTTPS URL of the webpage to guess datetime information

Implementation Reference

src/utils/guess-datetime.ts:1138-1168 (handler)

Main handler function that fetches the given URL, extracts datetime indicators from HTTP headers/meta, HTML content, Schema.org data, JS timestamps, comments, Git info, and RSS/Atom/sitemap feeds, then heuristically selects the best last update/publish datetime with confidence score.

export async function guessDatetimeFromUrl(url: string): Promise<{
    bestGuess: string | null;
    confidence: number;
}> {
    try {
        // Fetch the target webpage
        const response = await fetch(url);

        if (!response.ok) {
            throw new Error(`HTTP ${response.status}: ${response.statusText}`);
        }

        const text = await response.text();

        // Extract all possible time indicators
        const updateTimes = await extractAllTimeIndicators(response, text, url);

        // Advanced heuristic-based determination of the "true" update time
        const bestGuess = determineBestUpdateTime(updateTimes);

        // Result with confidence score
        const result = {
            bestGuess: bestGuess.timestamp,
            confidence: bestGuess.confidence
        };

        return result;
    } catch (error) {
        throw new Error(`Failed to guess datetime from URL: ${error instanceof Error ? error.message : String(error)}`);
    }
}

src/tools/jina-tools.ts:81-104 (registration)

MCP server.tool registration for 'guess_datetime_url', including full description, Zod input schema (url: string URL), and thin async handler that dynamically imports and calls the core guessDatetimeFromUrl utility.

if (isToolEnabled("guess_datetime_url")) {
	server.tool(
		"guess_datetime_url",
		"Guess the last updated or published datetime of a web page. This tool examines HTTP headers, HTML metadata, Schema.org data, visible dates, JavaScript timestamps, HTML comments, Git information, RSS/Atom feeds, sitemaps, and international date formats to provide the most accurate update time with confidence scores. Returns the best guess timestamp and confidence level.",
		{
			url: z.string().url().describe("The complete HTTP/HTTPS URL of the webpage to guess datetime information")
		},
		async ({ url }: { url: string }) => {
			try {
				// Import the utility function
				const { guessDatetimeFromUrl } = await import("../utils/guess-datetime.js");

				// Analyze the URL for datetime information
				const result = await guessDatetimeFromUrl(url);

				return {
					content: [{ type: "text" as const, text: yamlStringify(result) }],
				};
			} catch (error) {
				return createErrorResponse(`Error: ${error instanceof Error ? error.message : String(error)}`);
			}
		},
	);
}

src/tools/jina-tools.ts:86-86 (schema)
Zod schema definition for the tool input: a single required 'url' parameter validated as a URL string.
```
url: z.string().url().describe("The complete HTTP/HTTPS URL of the webpage to guess datetime information")
```

src/utils/guess-datetime.ts:803-1135 (helper)

Key helper function that applies prioritized heuristics to all extracted datetime indicators (meta, schema, feeds, etc.) to determine the single best update timestamp with confidence score and reasoning.

function determineBestUpdateTime(updateTimes: any) {
    // First check for meta tags that explicitly indicate last modified time
    if (updateTimes.metaTags && updateTimes.metaTags.lastModified) {
        return {
            timestamp: updateTimes.metaTags.lastModified,
            confidence: 95,
            reasoning: ["Explicit lastmodifiedtime meta tag"]
        };
    }

    // Check other meta tags related to publication/modification
    if (updateTimes.metaTags) {
        if (updateTimes.metaTags.articleModified) {
            return {
                timestamp: updateTimes.metaTags.articleModified,
                confidence: 90,
                reasoning: ["Article modified time meta tag"]
            };
        }

        if (updateTimes.metaTags.publishedDate) {
            return {
                timestamp: updateTimes.metaTags.publishedDate,
                confidence: 85,
                reasoning: ["Published date meta tag"]
            };
        }

        // Check for new high-value meta tags
        if (updateTimes.metaTags.ogUpdatedTime) {
            return {
                timestamp: updateTimes.metaTags.ogUpdatedTime,
                confidence: 88,
                reasoning: ["Open Graph updated time meta tag"]
            };
        }

        if (updateTimes.metaTags.lastmod) {
            return {
                timestamp: updateTimes.metaTags.lastmod,
                confidence: 87,
                reasoning: ["Last modified meta tag"]
            };
        }

        if (updateTimes.metaTags.generated) {
            return {
                timestamp: updateTimes.metaTags.generated,
                confidence: 85,
                reasoning: ["Page generated meta tag"]
            };
        }

        if (updateTimes.metaTags.build) {
            return {
                timestamp: updateTimes.metaTags.build,
                confidence: 83,
                reasoning: ["Build timestamp meta tag"]
            };
        }
    }

    // Check feed timestamps (RSS, Atom, Sitemap) - often very reliable
    if (updateTimes.feedTimestamps && updateTimes.feedTimestamps.length > 0) {
        // Filter for high priority feed timestamps
        const highPriorityFeeds = updateTimes.feedTimestamps
            .filter((stamp: any) => stamp.priority === 'high')
            .map((stamp: any) => ({ date: new Date(stamp.date), type: stamp.type, context: stamp.context }));

        if (highPriorityFeeds.length > 0) {
            // Sort by recency
            highPriorityFeeds.sort((a: any, b: any) => b.date.getTime() - a.date.getTime());

            return {
                timestamp: highPriorityFeeds[0].date.toISOString(),
                confidence: 92,
                reasoning: ["Feed timestamp", `Type: ${highPriorityFeeds[0].type}`, `Context: ${highPriorityFeeds[0].context}`]
            };
        }

        // If no high priority feeds, use most recent feed timestamp
        const allFeedDates = updateTimes.feedTimestamps
            .map((stamp: any) => ({ date: new Date(stamp.date), type: stamp.type, context: stamp.context }));

        allFeedDates.sort((a: any, b: any) => b.date.getTime() - a.date.getTime());

        return {
            timestamp: allFeedDates[0].date.toISOString(),
            confidence: 85,
            reasoning: ["Feed timestamp", `Type: ${allFeedDates[0].type}`, `Context: ${allFeedDates[0].context}`]
        };
    }

    // Check visible dates with high priority markers
    if (updateTimes.visibleDates && updateTimes.visibleDates.length > 0) {
        // Look for dates that appear to be part of lastmodified content
        const contentDates = updateTimes.visibleDates.filter((d: any) => {
            const ctx = d.context.toLowerCase();
            return ctx.includes('lastmodified') ||
                ctx.includes('last modified') ||
                ctx.includes('updated') ||
                ctx.includes('修改') ||  // Chinese for "modified"
                ctx.includes('更新');    // Chinese for "updated" 
        });

        if (contentDates.length > 0) {
            // Sort by recency
            const dates = contentDates.map((d: any) => new Date(d.date));
            dates.sort((a: Date, b: Date) => {
                if (!a || !b) return 0;
                return b.getTime() - a.getTime();
            });

            return {
                timestamp: dates[0].toISOString(),
                confidence: 92,
                reasoning: ["Content explicitly marked as modified/updated"]
            };
        }

        // Next check for dates that appear in common date display elements
        const displayDateElements = updateTimes.visibleDates.filter((d: any) => {
            const ctx = d.context.toLowerCase();
            return ctx.includes('class="date') ||
                ctx.includes('class="time') ||
                ctx.includes('class="pubdate') ||
                ctx.includes('class="published') ||
                ctx.includes('pages-date') ||
                ctx.includes('pub-date');
        });

        if (displayDateElements.length > 0) {
            const dates = displayDateElements.map((d: any) => new Date(d.date));
            dates.sort((a: Date, b: Date) => b.getTime() - a.getTime());

            return {
                timestamp: dates[0].toISOString(),
                confidence: 88,
                reasoning: ["Date from primary content display element"]
            };
        }
    }

    // Check for Schema.org timestamps
    if (updateTimes.schemaOrgTimestamps && updateTimes.schemaOrgTimestamps.length > 0) {
        // Filter for high priority fields: dateModified and dateUpdated
        const highPriorityDates = updateTimes.schemaOrgTimestamps
            .filter((stamp: any) => stamp.priority === 'high')
            .map((stamp: any) => ({ date: new Date(stamp.date), field: stamp.field, context: stamp.context }));

        if (highPriorityDates.length > 0) {
            // Sort by recency
            highPriorityDates.sort((a: any, b: any) => b.date.getTime() - a.date.getTime());

            return {
                timestamp: highPriorityDates[0].date.toISOString(),
                confidence: 85,
                reasoning: ["Schema.org structured data", `Field: ${highPriorityDates[0].field}`, `Context: ${highPriorityDates[0].context}`]
            };
        }

        // If no high priority fields, use most recent Schema.org date
        const allSchemaDates = updateTimes.schemaOrgTimestamps
            .map((stamp: any) => ({ date: new Date(stamp.date), field: stamp.field, context: stamp.context }));

        allSchemaDates.sort((a: any, b: any) => b.date.getTime() - a.date.getTime());

        return {
            timestamp: allSchemaDates[0].date.toISOString(),
            confidence: 75,
            reasoning: ["Schema.org structured data", `Field: ${allSchemaDates[0].field}`, `Context: ${allSchemaDates[0].context}`]
        };
    }

    // Check Git info (often very reliable)
    if (updateTimes.gitInfo && updateTimes.gitInfo.gitDate) {
        return {
            timestamp: updateTimes.gitInfo.gitDate,
            confidence: 90,
            reasoning: ["Git commit information", updateTimes.gitInfo.gitHash ? `Git hash: ${updateTimes.gitInfo.gitHash}` : ""]
        };
    } else if (updateTimes.gitInfo && updateTimes.gitInfo.deployDate) {
        return {
            timestamp: updateTimes.gitInfo.deployDate,
            confidence: 88,
            reasoning: ["Git deployment timestamp"]
        };
    }

    // JSON-LD structured data is also quite reliable
    if (updateTimes.jsTimestamps && updateTimes.jsTimestamps.length > 0) {
        const jsonLdDates = updateTimes.jsTimestamps
            .filter((stamp: any) => stamp.type === 'jsonLd')
            .map((stamp: any) => ({
                date: new Date(stamp.date),
                field: stamp.field,
                priority: stamp.priority
            }));

        if (jsonLdDates.length > 0) {
            // Sort by priority and recency
            jsonLdDates.sort((a: any, b: any) => {
                if (a.priority === 'high' && b.priority !== 'high') return -1;
                if (a.priority !== 'high' && b.priority === 'high') return 1;
                return b.date.getTime() - a.date.getTime();
            });

            return {
                timestamp: jsonLdDates[0].date.toISOString(),
                confidence: jsonLdDates[0].priority === 'high' ? 80 : 65,
                reasoning: [`JSON-LD structured data (${jsonLdDates[0].field})`]
            };
        }
    }

    // If we have a page generation time meta tag, it's a decent indicator
    if (updateTimes.metaTags && updateTimes.metaTags.pageGenerated) {
        return {
            timestamp: updateTimes.metaTags.pageGenerated,
            confidence: 75,
            reasoning: ["Page generation time meta tag"]
        };
    }

    // Process visible dates that don't have explicit modification indicators
    if (updateTimes.visibleDates && updateTimes.visibleDates.length > 0) {
        // Get all dates and sort by recency
        const allDates = updateTimes.visibleDates.map((d: any) => ({
            date: new Date(d.date),
            context: d.context
        }));

        allDates.sort((a: any, b: any) => b.date.getTime() - a.date.getTime());

        return {
            timestamp: allDates[0].date.toISOString(),
            confidence: 70,
            reasoning: ["Most recent date found in page content", `Context: "${allDates[0].context}"`]
        };
    }

    // Try HTML comments
    if (updateTimes.htmlComments && updateTimes.htmlComments.length > 0) {
        const commentDates = updateTimes.htmlComments.map((c: any) => ({
            date: new Date(c.date),
            context: c.context
        }));

        commentDates.sort((a: any, b: any) => b.date.getTime() - a.date.getTime());

        return {
            timestamp: commentDates[0].date.toISOString(),
            confidence: 60,
            reasoning: ["Timestamp from HTML comment", `Context: "${commentDates[0].context}"`]
        };
    }

    // Try JavaScript timestamps
    if (updateTimes.jsTimestamps && updateTimes.jsTimestamps.length > 0) {
        const jsDates = updateTimes.jsTimestamps
            .filter((stamp: any) => stamp.type !== 'jsonLd')
            .map((stamp: any) => ({
                date: new Date(stamp.date),
                context: stamp.context,
                type: stamp.type
            }));

        if (jsDates.length > 0) {
            // Sort by recency
            jsDates.sort((a: any, b: any) => b.date.getTime() - a.date.getTime());

            return {
                timestamp: jsDates[0].date.toISOString(),
                confidence: 60,
                reasoning: ["JavaScript timestamp found", `Context: "${jsDates[0].context}"`]
            };
        }
    }

    // Use HTTP Last-Modified even if it matches server time, but with lower confidence
    if (updateTimes.lastModified) {
        const lastModDate = new Date(updateTimes.lastModified);
        if (!isNaN(lastModDate.getTime())) {
            // Check if Last-Modified differs significantly from server time
            if (updateTimes.date) {
                const serverDate = new Date(updateTimes.date);
                const timeDiff = Math.abs(lastModDate.getTime() - serverDate.getTime());

                if (timeDiff > 60000) { // More than 1 minute difference
                    return {
                        timestamp: lastModDate.toISOString(),
                        confidence: 75,
                        reasoning: ["HTTP Last-Modified header differs significantly from server time", `Difference: ${Math.round(timeDiff / 1000 / 60)} minutes`]
                    };
                } else if (timeDiff > 1000) { // More than 1 second difference
                    return {
                        timestamp: lastModDate.toISOString(),
                        confidence: 65,
                        reasoning: ["HTTP Last-Modified header differs from server time", `Difference: ${Math.round(timeDiff / 1000)} seconds`]
                    };
                } else {
                    return {
                        timestamp: lastModDate.toISOString(),
                        confidence: 40,
                        reasoning: ["HTTP Last-Modified header (may be server time)"]
                    };
                }
            } else {
                return {
                    timestamp: lastModDate.toISOString(),
                    confidence: 60,
                    reasoning: ["HTTP Last-Modified header"]
                };
            }
        }
    }

    // If all else fails, use the server date with very low confidence
    if (updateTimes.date) {
        return {
            timestamp: new Date(updateTimes.date).toISOString(),
            confidence: 10,
            reasoning: ["No update time found", "Using server date as fallback"]
        };
    }

    // Absolute fallback: unknown
    return {
        timestamp: null,
        confidence: 0,
        reasoning: ["No reliable update time indicators found"]
    };
}

src/index.ts:100-100 (registration)
Invocation of registerJinaTools in main MCP server creation, which conditionally registers 'guess_datetime_url' based on enabledTools filter.
```
registerJinaTools(server, () => currentProps, enabledTools);
```

Jina AI Remote MCP Server