Skip to main content
Glama
by wlmwwx

guess_datetime_url

Determine the last updated or published datetime of a webpage by analyzing HTTP headers, HTML metadata, visible dates, and more. Provides a timestamp with a confidence score for accuracy.

Instructions

Guess the last updated or published datetime of a web page. This tool examines HTTP headers, HTML metadata, Schema.org data, visible dates, JavaScript timestamps, HTML comments, Git information, RSS/Atom feeds, sitemaps, and international date formats to provide the most accurate update time with confidence scores. Returns the best guess timestamp and confidence level.

Input Schema

NameRequiredDescriptionDefault
urlYesThe complete HTTP/HTTPS URL of the webpage to guess datetime information

Input Schema (JSON Schema)

{ "$schema": "http://json-schema.org/draft-07/schema#", "additionalProperties": false, "properties": { "url": { "description": "The complete HTTP/HTTPS URL of the webpage to guess datetime information", "format": "uri", "type": "string" } }, "required": [ "url" ], "type": "object" }

Implementation Reference

  • Registration of the 'guess_datetime_url' tool including schema definition, description, and thin handler that imports and calls the core guessDatetimeFromUrl helper function.
    server.tool( "guess_datetime_url", "Guess the last updated or published datetime of a web page. This tool examines HTTP headers, HTML metadata, Schema.org data, visible dates, JavaScript timestamps, HTML comments, Git information, RSS/Atom feeds, sitemaps, and international date formats to provide the most accurate update time with confidence scores. Returns the best guess timestamp and confidence level.", { url: z.string().url().describe("The complete HTTP/HTTPS URL of the webpage to guess datetime information") }, async ({ url }: { url: string }) => { try { // Import the utility function const { guessDatetimeFromUrl } = await import("../utils/guess-datetime.js"); // Analyze the URL for datetime information const result = await guessDatetimeFromUrl(url); return { content: [{ type: "text" as const, text: yamlStringify(result) }], }; } catch (error) { return createErrorResponse(`Error: ${error instanceof Error ? error.message : String(error)}`); } }, );
  • Core handler logic implementing the tool: fetches the webpage, extracts time indicators from multiple sources (headers, meta, schema.org, visible text, JS, comments, git, feeds, sitemaps), determines the best guess timestamp using heuristics and priorities, returns with confidence score.
    export async function guessDatetimeFromUrl(url: string): Promise<{ bestGuess: string | null; confidence: number; }> { try { // Fetch the target webpage const response = await fetch(url); if (!response.ok) { throw new Error(`HTTP ${response.status}: ${response.statusText}`); } const text = await response.text(); // Extract all possible time indicators const updateTimes = await extractAllTimeIndicators(response, text, url); // Advanced heuristic-based determination of the "true" update time const bestGuess = determineBestUpdateTime(updateTimes); // Result with confidence score const result = { bestGuess: bestGuess.timestamp, confidence: bestGuess.confidence }; return result; } catch (error) { throw new Error(`Failed to guess datetime from URL: ${error instanceof Error ? error.message : String(error)}`); } }
  • Key helper function that evaluates all extracted time indicators using a priority-based heuristic system to select the most reliable update timestamp with confidence score and reasoning.
    function determineBestUpdateTime(updateTimes: any) { // First check for meta tags that explicitly indicate last modified time if (updateTimes.metaTags && updateTimes.metaTags.lastModified) { return { timestamp: updateTimes.metaTags.lastModified, confidence: 95, reasoning: ["Explicit lastmodifiedtime meta tag"] }; } // Check other meta tags related to publication/modification if (updateTimes.metaTags) { if (updateTimes.metaTags.articleModified) { return { timestamp: updateTimes.metaTags.articleModified, confidence: 90, reasoning: ["Article modified time meta tag"] }; } if (updateTimes.metaTags.publishedDate) { return { timestamp: updateTimes.metaTags.publishedDate, confidence: 85, reasoning: ["Published date meta tag"] }; } // Check for new high-value meta tags if (updateTimes.metaTags.ogUpdatedTime) { return { timestamp: updateTimes.metaTags.ogUpdatedTime, confidence: 88, reasoning: ["Open Graph updated time meta tag"] }; } if (updateTimes.metaTags.lastmod) { return { timestamp: updateTimes.metaTags.lastmod, confidence: 87, reasoning: ["Last modified meta tag"] }; } if (updateTimes.metaTags.generated) { return { timestamp: updateTimes.metaTags.generated, confidence: 85, reasoning: ["Page generated meta tag"] }; } if (updateTimes.metaTags.build) { return { timestamp: updateTimes.metaTags.build, confidence: 83, reasoning: ["Build timestamp meta tag"] }; } } // Check feed timestamps (RSS, Atom, Sitemap) - often very reliable if (updateTimes.feedTimestamps && updateTimes.feedTimestamps.length > 0) { // Filter for high priority feed timestamps const highPriorityFeeds = updateTimes.feedTimestamps .filter((stamp: any) => stamp.priority === 'high') .map((stamp: any) => ({ date: new Date(stamp.date), type: stamp.type, context: stamp.context })); if (highPriorityFeeds.length > 0) { // Sort by recency highPriorityFeeds.sort((a: any, b: any) => b.date.getTime() - a.date.getTime()); return { timestamp: highPriorityFeeds[0].date.toISOString(), confidence: 92, reasoning: ["Feed timestamp", `Type: ${highPriorityFeeds[0].type}`, `Context: ${highPriorityFeeds[0].context}`] }; } // If no high priority feeds, use most recent feed timestamp const allFeedDates = updateTimes.feedTimestamps .map((stamp: any) => ({ date: new Date(stamp.date), type: stamp.type, context: stamp.context })); allFeedDates.sort((a: any, b: any) => b.date.getTime() - a.date.getTime()); return { timestamp: allFeedDates[0].date.toISOString(), confidence: 85, reasoning: ["Feed timestamp", `Type: ${allFeedDates[0].type}`, `Context: ${allFeedDates[0].context}`] }; } // Check visible dates with high priority markers if (updateTimes.visibleDates && updateTimes.visibleDates.length > 0) { // Look for dates that appear to be part of lastmodified content const contentDates = updateTimes.visibleDates.filter((d: any) => { const ctx = d.context.toLowerCase(); return ctx.includes('lastmodified') || ctx.includes('last modified') || ctx.includes('updated') || ctx.includes('修改') || // Chinese for "modified" ctx.includes('更新'); // Chinese for "updated" }); if (contentDates.length > 0) { // Sort by recency const dates = contentDates.map((d: any) => new Date(d.date)); dates.sort((a: Date, b: Date) => { if (!a || !b) return 0; return b.getTime() - a.getTime(); }); return { timestamp: dates[0].toISOString(), confidence: 92, reasoning: ["Content explicitly marked as modified/updated"] }; } // Next check for dates that appear in common date display elements const displayDateElements = updateTimes.visibleDates.filter((d: any) => { const ctx = d.context.toLowerCase(); return ctx.includes('class="date') || ctx.includes('class="time') || ctx.includes('class="pubdate') || ctx.includes('class="published') || ctx.includes('pages-date') || ctx.includes('pub-date'); }); if (displayDateElements.length > 0) { const dates = displayDateElements.map((d: any) => new Date(d.date)); dates.sort((a: Date, b: Date) => b.getTime() - a.getTime()); return { timestamp: dates[0].toISOString(), confidence: 88, reasoning: ["Date from primary content display element"] }; } } // Check for Schema.org timestamps if (updateTimes.schemaOrgTimestamps && updateTimes.schemaOrgTimestamps.length > 0) { // Filter for high priority fields: dateModified and dateUpdated const highPriorityDates = updateTimes.schemaOrgTimestamps .filter((stamp: any) => stamp.priority === 'high') .map((stamp: any) => ({ date: new Date(stamp.date), field: stamp.field, context: stamp.context })); if (highPriorityDates.length > 0) { // Sort by recency highPriorityDates.sort((a: any, b: any) => b.date.getTime() - a.date.getTime()); return { timestamp: highPriorityDates[0].date.toISOString(), confidence: 85, reasoning: ["Schema.org structured data", `Field: ${highPriorityDates[0].field}`, `Context: ${highPriorityDates[0].context}`] }; } // If no high priority fields, use most recent Schema.org date const allSchemaDates = updateTimes.schemaOrgTimestamps .map((stamp: any) => ({ date: new Date(stamp.date), field: stamp.field, context: stamp.context })); allSchemaDates.sort((a: any, b: any) => b.date.getTime() - a.date.getTime()); return { timestamp: allSchemaDates[0].date.toISOString(), confidence: 75, reasoning: ["Schema.org structured data", `Field: ${allSchemaDates[0].field}`, `Context: ${allSchemaDates[0].context}`] }; } // Check Git info (often very reliable) if (updateTimes.gitInfo && updateTimes.gitInfo.gitDate) { return { timestamp: updateTimes.gitInfo.gitDate, confidence: 90, reasoning: ["Git commit information", updateTimes.gitInfo.gitHash ? `Git hash: ${updateTimes.gitInfo.gitHash}` : ""] }; } else if (updateTimes.gitInfo && updateTimes.gitInfo.deployDate) { return { timestamp: updateTimes.gitInfo.deployDate, confidence: 88, reasoning: ["Git deployment timestamp"] }; } // JSON-LD structured data is also quite reliable if (updateTimes.jsTimestamps && updateTimes.jsTimestamps.length > 0) { const jsonLdDates = updateTimes.jsTimestamps .filter((stamp: any) => stamp.type === 'jsonLd') .map((stamp: any) => ({ date: new Date(stamp.date), field: stamp.field, priority: stamp.priority })); if (jsonLdDates.length > 0) { // Sort by priority and recency jsonLdDates.sort((a: any, b: any) => { if (a.priority === 'high' && b.priority !== 'high') return -1; if (a.priority !== 'high' && b.priority === 'high') return 1; return b.date.getTime() - a.date.getTime(); }); return { timestamp: jsonLdDates[0].date.toISOString(), confidence: jsonLdDates[0].priority === 'high' ? 80 : 65, reasoning: [`JSON-LD structured data (${jsonLdDates[0].field})`] }; } } // If we have a page generation time meta tag, it's a decent indicator if (updateTimes.metaTags && updateTimes.metaTags.pageGenerated) { return { timestamp: updateTimes.metaTags.pageGenerated, confidence: 75, reasoning: ["Page generation time meta tag"] }; } // Process visible dates that don't have explicit modification indicators if (updateTimes.visibleDates && updateTimes.visibleDates.length > 0) { // Get all dates and sort by recency const allDates = updateTimes.visibleDates.map((d: any) => ({ date: new Date(d.date), context: d.context })); allDates.sort((a: any, b: any) => b.date.getTime() - a.date.getTime()); return { timestamp: allDates[0].date.toISOString(), confidence: 70, reasoning: ["Most recent date found in page content", `Context: "${allDates[0].context}"`] }; } // Try HTML comments if (updateTimes.htmlComments && updateTimes.htmlComments.length > 0) { const commentDates = updateTimes.htmlComments.map((c: any) => ({ date: new Date(c.date), context: c.context })); commentDates.sort((a: any, b: any) => b.date.getTime() - a.date.getTime()); return { timestamp: commentDates[0].date.toISOString(), confidence: 60, reasoning: ["Timestamp from HTML comment", `Context: "${commentDates[0].context}"`] }; } // Try JavaScript timestamps if (updateTimes.jsTimestamps && updateTimes.jsTimestamps.length > 0) { const jsDates = updateTimes.jsTimestamps .filter((stamp: any) => stamp.type !== 'jsonLd') .map((stamp: any) => ({ date: new Date(stamp.date), context: stamp.context, type: stamp.type })); if (jsDates.length > 0) { // Sort by recency jsDates.sort((a: any, b: any) => b.date.getTime() - a.date.getTime()); return { timestamp: jsDates[0].date.toISOString(), confidence: 60, reasoning: ["JavaScript timestamp found", `Context: "${jsDates[0].context}"`] }; } } // Use HTTP Last-Modified even if it matches server time, but with lower confidence if (updateTimes.lastModified) { const lastModDate = new Date(updateTimes.lastModified); if (!isNaN(lastModDate.getTime())) { // Check if Last-Modified differs significantly from server time if (updateTimes.date) { const serverDate = new Date(updateTimes.date); const timeDiff = Math.abs(lastModDate.getTime() - serverDate.getTime()); if (timeDiff > 60000) { // More than 1 minute difference return { timestamp: lastModDate.toISOString(), confidence: 75, reasoning: ["HTTP Last-Modified header differs significantly from server time", `Difference: ${Math.round(timeDiff / 1000 / 60)} minutes`] }; } else if (timeDiff > 1000) { // More than 1 second difference return { timestamp: lastModDate.toISOString(), confidence: 65, reasoning: ["HTTP Last-Modified header differs from server time", `Difference: ${Math.round(timeDiff / 1000)} seconds`] }; } else { return { timestamp: lastModDate.toISOString(), confidence: 40, reasoning: ["HTTP Last-Modified header (may be server time)"] }; } } else { return { timestamp: lastModDate.toISOString(), confidence: 60, reasoning: ["HTTP Last-Modified header"] }; } } } // If all else fails, use the server date with very low confidence if (updateTimes.date) { return { timestamp: new Date(updateTimes.date).toISOString(), confidence: 10, reasoning: ["No update time found", "Using server date as fallback"] }; } // Absolute fallback: unknown return { timestamp: null, confidence: 0, reasoning: ["No reliable update time indicators found"] }; }

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/wlmwwx/jina-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server