Skip to main content
Glama
Fabien-desablens

MCP Webpage Timestamps

extract_timestamps

Extract creation, modification, and publication timestamps from webpage URLs using HTML meta tags, HTTP headers, and structured data.

Instructions

Extract creation, modification, and publication timestamps from a webpage

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
urlYesThe URL of the webpage to extract timestamps from
configNoOptional configuration for the extraction

Implementation Reference

  • Core handler function implementing the timestamp extraction logic: fetches webpage, parses with Cheerio, extracts from meta tags, headers, JSON-LD, microdata, OpenGraph, Twitter cards, heuristics, and consolidates results.
    async extractTimestamps(url: string): Promise<TimestampResult> {
      const errors: string[] = [];
      let fetchResult: FetchResult;
    
      try {
        fetchResult = await this.fetchPage(url);
      } catch (error) {
        return {
          url,
          sources: [],
          confidence: 'low',
          errors: [`Failed to fetch page: ${error instanceof Error ? error.message : String(error)}`],
        };
      }
    
      const $ = cheerio.load(fetchResult.html);
      const sources: TimestampSource[] = [];
    
      // Extract timestamps from various sources
      sources.push(...this.extractFromHtmlMeta($));
      sources.push(...this.extractFromHttpHeaders(fetchResult.headers));
      sources.push(...this.extractFromJsonLd($));
      sources.push(...this.extractFromMicrodata($));
      sources.push(...this.extractFromOpenGraph($));
      sources.push(...this.extractFromTwitterCards($));
      
      if (this.config.enableHeuristics) {
        sources.push(...this.extractFromHeuristics($));
      }
    
      const result = this.consolidateTimestamps(url, sources);
      
      if (errors.length > 0) {
        result.errors = errors;
      }
    
      return result;
    }
  • src/index.ts:28-66 (registration)
    MCP tool registration defining name, description, and input schema for 'extract_timestamps'.
      name: 'extract_timestamps',
      description: 'Extract creation, modification, and publication timestamps from a webpage',
      inputSchema: {
        type: 'object',
        properties: {
          url: {
            type: 'string',
            description: 'The URL of the webpage to extract timestamps from',
          },
          config: {
            type: 'object',
            description: 'Optional configuration for the extraction',
            properties: {
              timeout: {
                type: 'number',
                description: 'Request timeout in milliseconds (default: 10000)',
              },
              userAgent: {
                type: 'string',
                description: 'User agent string to use for requests',
              },
              followRedirects: {
                type: 'boolean',
                description: 'Whether to follow HTTP redirects (default: true)',
              },
              maxRedirects: {
                type: 'number',
                description: 'Maximum number of redirects to follow (default: 5)',
              },
              enableHeuristics: {
                type: 'boolean',
                description: 'Whether to enable heuristic timestamp detection (default: true)',
              },
            },
          },
        },
        required: ['url'],
      },
    },
  • MCP CallToolRequestSchema handler branch for 'extract_timestamps': validates input, instantiates extractor if config provided, calls extractTimestamps, and returns JSON result.
    if (name === 'extract_timestamps') {
      const { url, config } = args as {
        url: string;
        config?: {
          timeout?: number;
          userAgent?: string;
          followRedirects?: boolean;
          maxRedirects?: number;
          enableHeuristics?: boolean;
        };
      };
    
      if (!url) {
        return {
          content: [
            {
              type: 'text',
              text: 'Error: URL is required',
            },
          ],
          isError: true,
        };
      }
    
      const timestampExtractor = config ? new TimestampExtractor(config) : extractor;
      const result = await timestampExtractor.extractTimestamps(url);
    
      return {
        content: [
          {
            type: 'text',
            text: JSON.stringify(result, null, 2),
          },
        ],
      };
    }
  • TypeScript interfaces defining the output structure (TimestampResult) and sources (TimestampSource) used by the tool.
    export interface TimestampResult {
      url: string;
      createdAt?: Date;
      modifiedAt?: Date;
      publishedAt?: Date;
      sources: TimestampSource[];
      confidence: 'high' | 'medium' | 'low';
      errors?: string[];
    }
    
    export interface TimestampSource {
      type: 'html-meta' | 'http-header' | 'json-ld' | 'microdata' | 'opengraph' | 'twitter' | 'heuristic';
      field: string;
      value: string;
      confidence: 'high' | 'medium' | 'low';
    }
  • Type definition for optional config input parameter.
    export interface ExtractorConfig {
      timeout?: number;
      userAgent?: string;
      followRedirects?: boolean;
      maxRedirects?: number;
      enableHeuristics?: boolean;
    }
Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/Fabien-desablens/mcp-webpage-timestamps'

If you have feedback or need assistance with the MCP directory API, please join our Discord server