Skip to main content
Glama
jina-ai

Jina AI Remote MCP Server

Official
by jina-ai

guess_datetime_url

Determine the last updated or published date of a web page by analyzing HTTP headers, HTML metadata, Schema.org data, visible dates, and other sources to provide an accurate timestamp with confidence scores.

Instructions

Guess the last updated or published datetime of a web page. This tool examines HTTP headers, HTML metadata, Schema.org data, visible dates, JavaScript timestamps, HTML comments, Git information, RSS/Atom feeds, sitemaps, and international date formats to provide the most accurate update time with confidence scores. Returns the best guess timestamp and confidence level.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
urlYesThe complete HTTP/HTTPS URL of the webpage to guess datetime information

Implementation Reference

  • Main handler function that fetches the given URL, extracts datetime indicators from HTTP headers/meta, HTML content, Schema.org data, JS timestamps, comments, Git info, and RSS/Atom/sitemap feeds, then heuristically selects the best last update/publish datetime with confidence score.
    export async function guessDatetimeFromUrl(url: string): Promise<{
        bestGuess: string | null;
        confidence: number;
    }> {
        try {
            // Fetch the target webpage
            const response = await fetch(url);
    
            if (!response.ok) {
                throw new Error(`HTTP ${response.status}: ${response.statusText}`);
            }
    
            const text = await response.text();
    
            // Extract all possible time indicators
            const updateTimes = await extractAllTimeIndicators(response, text, url);
    
            // Advanced heuristic-based determination of the "true" update time
            const bestGuess = determineBestUpdateTime(updateTimes);
    
            // Result with confidence score
            const result = {
                bestGuess: bestGuess.timestamp,
                confidence: bestGuess.confidence
            };
    
            return result;
        } catch (error) {
            throw new Error(`Failed to guess datetime from URL: ${error instanceof Error ? error.message : String(error)}`);
        }
    }
  • MCP server.tool registration for 'guess_datetime_url', including full description, Zod input schema (url: string URL), and thin async handler that dynamically imports and calls the core guessDatetimeFromUrl utility.
    if (isToolEnabled("guess_datetime_url")) {
    	server.tool(
    		"guess_datetime_url",
    		"Guess the last updated or published datetime of a web page. This tool examines HTTP headers, HTML metadata, Schema.org data, visible dates, JavaScript timestamps, HTML comments, Git information, RSS/Atom feeds, sitemaps, and international date formats to provide the most accurate update time with confidence scores. Returns the best guess timestamp and confidence level.",
    		{
    			url: z.string().url().describe("The complete HTTP/HTTPS URL of the webpage to guess datetime information")
    		},
    		async ({ url }: { url: string }) => {
    			try {
    				// Import the utility function
    				const { guessDatetimeFromUrl } = await import("../utils/guess-datetime.js");
    
    				// Analyze the URL for datetime information
    				const result = await guessDatetimeFromUrl(url);
    
    				return {
    					content: [{ type: "text" as const, text: yamlStringify(result) }],
    				};
    			} catch (error) {
    				return createErrorResponse(`Error: ${error instanceof Error ? error.message : String(error)}`);
    			}
    		},
    	);
    }
  • Zod schema definition for the tool input: a single required 'url' parameter validated as a URL string.
    url: z.string().url().describe("The complete HTTP/HTTPS URL of the webpage to guess datetime information")
  • Key helper function that applies prioritized heuristics to all extracted datetime indicators (meta, schema, feeds, etc.) to determine the single best update timestamp with confidence score and reasoning.
    function determineBestUpdateTime(updateTimes: any) {
        // First check for meta tags that explicitly indicate last modified time
        if (updateTimes.metaTags && updateTimes.metaTags.lastModified) {
            return {
                timestamp: updateTimes.metaTags.lastModified,
                confidence: 95,
                reasoning: ["Explicit lastmodifiedtime meta tag"]
            };
        }
    
        // Check other meta tags related to publication/modification
        if (updateTimes.metaTags) {
            if (updateTimes.metaTags.articleModified) {
                return {
                    timestamp: updateTimes.metaTags.articleModified,
                    confidence: 90,
                    reasoning: ["Article modified time meta tag"]
                };
            }
    
            if (updateTimes.metaTags.publishedDate) {
                return {
                    timestamp: updateTimes.metaTags.publishedDate,
                    confidence: 85,
                    reasoning: ["Published date meta tag"]
                };
            }
    
            // Check for new high-value meta tags
            if (updateTimes.metaTags.ogUpdatedTime) {
                return {
                    timestamp: updateTimes.metaTags.ogUpdatedTime,
                    confidence: 88,
                    reasoning: ["Open Graph updated time meta tag"]
                };
            }
    
            if (updateTimes.metaTags.lastmod) {
                return {
                    timestamp: updateTimes.metaTags.lastmod,
                    confidence: 87,
                    reasoning: ["Last modified meta tag"]
                };
            }
    
            if (updateTimes.metaTags.generated) {
                return {
                    timestamp: updateTimes.metaTags.generated,
                    confidence: 85,
                    reasoning: ["Page generated meta tag"]
                };
            }
    
            if (updateTimes.metaTags.build) {
                return {
                    timestamp: updateTimes.metaTags.build,
                    confidence: 83,
                    reasoning: ["Build timestamp meta tag"]
                };
            }
        }
    
        // Check feed timestamps (RSS, Atom, Sitemap) - often very reliable
        if (updateTimes.feedTimestamps && updateTimes.feedTimestamps.length > 0) {
            // Filter for high priority feed timestamps
            const highPriorityFeeds = updateTimes.feedTimestamps
                .filter((stamp: any) => stamp.priority === 'high')
                .map((stamp: any) => ({ date: new Date(stamp.date), type: stamp.type, context: stamp.context }));
    
            if (highPriorityFeeds.length > 0) {
                // Sort by recency
                highPriorityFeeds.sort((a: any, b: any) => b.date.getTime() - a.date.getTime());
    
                return {
                    timestamp: highPriorityFeeds[0].date.toISOString(),
                    confidence: 92,
                    reasoning: ["Feed timestamp", `Type: ${highPriorityFeeds[0].type}`, `Context: ${highPriorityFeeds[0].context}`]
                };
            }
    
            // If no high priority feeds, use most recent feed timestamp
            const allFeedDates = updateTimes.feedTimestamps
                .map((stamp: any) => ({ date: new Date(stamp.date), type: stamp.type, context: stamp.context }));
    
            allFeedDates.sort((a: any, b: any) => b.date.getTime() - a.date.getTime());
    
            return {
                timestamp: allFeedDates[0].date.toISOString(),
                confidence: 85,
                reasoning: ["Feed timestamp", `Type: ${allFeedDates[0].type}`, `Context: ${allFeedDates[0].context}`]
            };
        }
    
        // Check visible dates with high priority markers
        if (updateTimes.visibleDates && updateTimes.visibleDates.length > 0) {
            // Look for dates that appear to be part of lastmodified content
            const contentDates = updateTimes.visibleDates.filter((d: any) => {
                const ctx = d.context.toLowerCase();
                return ctx.includes('lastmodified') ||
                    ctx.includes('last modified') ||
                    ctx.includes('updated') ||
                    ctx.includes('修改') ||  // Chinese for "modified"
                    ctx.includes('更新');    // Chinese for "updated" 
            });
    
            if (contentDates.length > 0) {
                // Sort by recency
                const dates = contentDates.map((d: any) => new Date(d.date));
                dates.sort((a: Date, b: Date) => {
                    if (!a || !b) return 0;
                    return b.getTime() - a.getTime();
                });
    
                return {
                    timestamp: dates[0].toISOString(),
                    confidence: 92,
                    reasoning: ["Content explicitly marked as modified/updated"]
                };
            }
    
            // Next check for dates that appear in common date display elements
            const displayDateElements = updateTimes.visibleDates.filter((d: any) => {
                const ctx = d.context.toLowerCase();
                return ctx.includes('class="date') ||
                    ctx.includes('class="time') ||
                    ctx.includes('class="pubdate') ||
                    ctx.includes('class="published') ||
                    ctx.includes('pages-date') ||
                    ctx.includes('pub-date');
            });
    
            if (displayDateElements.length > 0) {
                const dates = displayDateElements.map((d: any) => new Date(d.date));
                dates.sort((a: Date, b: Date) => b.getTime() - a.getTime());
    
                return {
                    timestamp: dates[0].toISOString(),
                    confidence: 88,
                    reasoning: ["Date from primary content display element"]
                };
            }
        }
    
        // Check for Schema.org timestamps
        if (updateTimes.schemaOrgTimestamps && updateTimes.schemaOrgTimestamps.length > 0) {
            // Filter for high priority fields: dateModified and dateUpdated
            const highPriorityDates = updateTimes.schemaOrgTimestamps
                .filter((stamp: any) => stamp.priority === 'high')
                .map((stamp: any) => ({ date: new Date(stamp.date), field: stamp.field, context: stamp.context }));
    
            if (highPriorityDates.length > 0) {
                // Sort by recency
                highPriorityDates.sort((a: any, b: any) => b.date.getTime() - a.date.getTime());
    
                return {
                    timestamp: highPriorityDates[0].date.toISOString(),
                    confidence: 85,
                    reasoning: ["Schema.org structured data", `Field: ${highPriorityDates[0].field}`, `Context: ${highPriorityDates[0].context}`]
                };
            }
    
            // If no high priority fields, use most recent Schema.org date
            const allSchemaDates = updateTimes.schemaOrgTimestamps
                .map((stamp: any) => ({ date: new Date(stamp.date), field: stamp.field, context: stamp.context }));
    
            allSchemaDates.sort((a: any, b: any) => b.date.getTime() - a.date.getTime());
    
            return {
                timestamp: allSchemaDates[0].date.toISOString(),
                confidence: 75,
                reasoning: ["Schema.org structured data", `Field: ${allSchemaDates[0].field}`, `Context: ${allSchemaDates[0].context}`]
            };
        }
    
        // Check Git info (often very reliable)
        if (updateTimes.gitInfo && updateTimes.gitInfo.gitDate) {
            return {
                timestamp: updateTimes.gitInfo.gitDate,
                confidence: 90,
                reasoning: ["Git commit information", updateTimes.gitInfo.gitHash ? `Git hash: ${updateTimes.gitInfo.gitHash}` : ""]
            };
        } else if (updateTimes.gitInfo && updateTimes.gitInfo.deployDate) {
            return {
                timestamp: updateTimes.gitInfo.deployDate,
                confidence: 88,
                reasoning: ["Git deployment timestamp"]
            };
        }
    
        // JSON-LD structured data is also quite reliable
        if (updateTimes.jsTimestamps && updateTimes.jsTimestamps.length > 0) {
            const jsonLdDates = updateTimes.jsTimestamps
                .filter((stamp: any) => stamp.type === 'jsonLd')
                .map((stamp: any) => ({
                    date: new Date(stamp.date),
                    field: stamp.field,
                    priority: stamp.priority
                }));
    
            if (jsonLdDates.length > 0) {
                // Sort by priority and recency
                jsonLdDates.sort((a: any, b: any) => {
                    if (a.priority === 'high' && b.priority !== 'high') return -1;
                    if (a.priority !== 'high' && b.priority === 'high') return 1;
                    return b.date.getTime() - a.date.getTime();
                });
    
                return {
                    timestamp: jsonLdDates[0].date.toISOString(),
                    confidence: jsonLdDates[0].priority === 'high' ? 80 : 65,
                    reasoning: [`JSON-LD structured data (${jsonLdDates[0].field})`]
                };
            }
        }
    
        // If we have a page generation time meta tag, it's a decent indicator
        if (updateTimes.metaTags && updateTimes.metaTags.pageGenerated) {
            return {
                timestamp: updateTimes.metaTags.pageGenerated,
                confidence: 75,
                reasoning: ["Page generation time meta tag"]
            };
        }
    
        // Process visible dates that don't have explicit modification indicators
        if (updateTimes.visibleDates && updateTimes.visibleDates.length > 0) {
            // Get all dates and sort by recency
            const allDates = updateTimes.visibleDates.map((d: any) => ({
                date: new Date(d.date),
                context: d.context
            }));
    
            allDates.sort((a: any, b: any) => b.date.getTime() - a.date.getTime());
    
            return {
                timestamp: allDates[0].date.toISOString(),
                confidence: 70,
                reasoning: ["Most recent date found in page content", `Context: "${allDates[0].context}"`]
            };
        }
    
        // Try HTML comments
        if (updateTimes.htmlComments && updateTimes.htmlComments.length > 0) {
            const commentDates = updateTimes.htmlComments.map((c: any) => ({
                date: new Date(c.date),
                context: c.context
            }));
    
            commentDates.sort((a: any, b: any) => b.date.getTime() - a.date.getTime());
    
            return {
                timestamp: commentDates[0].date.toISOString(),
                confidence: 60,
                reasoning: ["Timestamp from HTML comment", `Context: "${commentDates[0].context}"`]
            };
        }
    
        // Try JavaScript timestamps
        if (updateTimes.jsTimestamps && updateTimes.jsTimestamps.length > 0) {
            const jsDates = updateTimes.jsTimestamps
                .filter((stamp: any) => stamp.type !== 'jsonLd')
                .map((stamp: any) => ({
                    date: new Date(stamp.date),
                    context: stamp.context,
                    type: stamp.type
                }));
    
            if (jsDates.length > 0) {
                // Sort by recency
                jsDates.sort((a: any, b: any) => b.date.getTime() - a.date.getTime());
    
                return {
                    timestamp: jsDates[0].date.toISOString(),
                    confidence: 60,
                    reasoning: ["JavaScript timestamp found", `Context: "${jsDates[0].context}"`]
                };
            }
        }
    
        // Use HTTP Last-Modified even if it matches server time, but with lower confidence
        if (updateTimes.lastModified) {
            const lastModDate = new Date(updateTimes.lastModified);
            if (!isNaN(lastModDate.getTime())) {
                // Check if Last-Modified differs significantly from server time
                if (updateTimes.date) {
                    const serverDate = new Date(updateTimes.date);
                    const timeDiff = Math.abs(lastModDate.getTime() - serverDate.getTime());
    
                    if (timeDiff > 60000) { // More than 1 minute difference
                        return {
                            timestamp: lastModDate.toISOString(),
                            confidence: 75,
                            reasoning: ["HTTP Last-Modified header differs significantly from server time", `Difference: ${Math.round(timeDiff / 1000 / 60)} minutes`]
                        };
                    } else if (timeDiff > 1000) { // More than 1 second difference
                        return {
                            timestamp: lastModDate.toISOString(),
                            confidence: 65,
                            reasoning: ["HTTP Last-Modified header differs from server time", `Difference: ${Math.round(timeDiff / 1000)} seconds`]
                        };
                    } else {
                        return {
                            timestamp: lastModDate.toISOString(),
                            confidence: 40,
                            reasoning: ["HTTP Last-Modified header (may be server time)"]
                        };
                    }
                } else {
                    return {
                        timestamp: lastModDate.toISOString(),
                        confidence: 60,
                        reasoning: ["HTTP Last-Modified header"]
                    };
                }
            }
        }
    
        // If all else fails, use the server date with very low confidence
        if (updateTimes.date) {
            return {
                timestamp: new Date(updateTimes.date).toISOString(),
                confidence: 10,
                reasoning: ["No update time found", "Using server date as fallback"]
            };
        }
    
        // Absolute fallback: unknown
        return {
            timestamp: null,
            confidence: 0,
            reasoning: ["No reliable update time indicators found"]
        };
    }
  • src/index.ts:100-100 (registration)
    Invocation of registerJinaTools in main MCP server creation, which conditionally registers 'guess_datetime_url' based on enabledTools filter.
    registerJinaTools(server, () => currentProps, enabledTools);
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It effectively describes the tool's approach by listing multiple data sources examined (e.g., HTTP headers, HTML metadata) and outputs (timestamp with confidence scores), giving a clear picture of its heuristic and probabilistic nature. However, it lacks details on error handling or performance characteristics.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is appropriately sized and front-loaded, starting with the core purpose and following with key details on methods and outputs. It avoids redundancy, though it could be slightly more streamlined by combining some of the listed data sources into broader categories.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (heuristic datetime guessing) and lack of annotations or output schema, the description does a good job of explaining the process and return values. It covers the input parameter indirectly and outlines the output structure, though it could benefit from more explicit details on confidence score ranges or error cases.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 100% description coverage, with the 'url' parameter well-documented in the schema itself. The description does not add any additional meaning or constraints beyond what the schema provides, such as URL format examples or validation rules, so it meets the baseline for high schema coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the specific action ('guess the last updated or published datetime') and resource ('a web page'), distinguishing it from sibling tools like 'read_url' or 'capture_screenshot_url' that focus on different webpage interactions. It specifies the exact temporal information being extracted, making the purpose unambiguous.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage when temporal metadata about a webpage is needed, but does not explicitly state when to use this tool versus alternatives like 'read_url' (which might return raw content) or 'parallel_search_web' (which might provide search results). No exclusions or prerequisites are mentioned, leaving the context somewhat open-ended.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/jina-ai/MCP'

If you have feedback or need assistance with the MCP directory API, please join our Discord server