Skip to main content
Glama
badchars

osint-mcp-server

by badchars

wayback_urls

Retrieve archived URLs from Wayback Machine to discover historical endpoints, hidden paths, and removed content for domain analysis.

Instructions

Search Wayback Machine for archived URLs of a domain. Returns unique URLs with timestamps, status codes, and MIME types. Useful for finding old endpoints, hidden paths, and removed content.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
domainYesDomain to search archived URLs for
match_typeNoCDX match type (exact, prefix, host, domain)
filterNoCDX filter (e.g. 'statuscode:200', 'mimetype:text/html')
limitNoMaximum URLs to return (default: 1000)

Implementation Reference

  • The handler function that executes the wayback_urls logic by querying the Wayback Machine CDX API.
    export async function waybackUrls(
      domain: string,
      matchType?: string,
      filter?: string,
      limit = 1000,
    ): Promise<WaybackUrlsResult> {
      await limiter.acquire();
    
      const params = new URLSearchParams({
        url: `*.${domain}/*`,
        output: "json",
        fl: "original,timestamp,statuscode,mimetype",
        collapse: "urlkey",
        limit: String(limit),
      });
      if (matchType) params.set("matchType", matchType);
      if (filter) params.set("filter", filter);
    
      const controller = new AbortController();
      const timeout = setTimeout(() => controller.abort(), 30000);
    
      try {
        const res = await fetch(`https://web.archive.org/cdx/search/cdx?${params}`, { signal: controller.signal });
        if (!res.ok) throw new Error(`Wayback CDX returned ${res.status}`);
    
        const data: string[][] = await res.json();
        // First row is header: ["original", "timestamp", "statuscode", "mimetype"]
        const rows = data.slice(1);
    
        const urls: WaybackUrl[] = rows.map((row) => ({
          url: row[0] ?? "",
          timestamp: row[1] ?? "",
          statusCode: row[2] ?? "",
          mimeType: row[3] ?? "",
        }));
    
        return { domain, totalUrls: urls.length, urls };
      } finally {
        clearTimeout(timeout);
      }
    }
  • Type definitions for the result of wayback_urls tool.
    interface WaybackUrl {
      url: string;
      timestamp: string;
      statusCode: string;
      mimeType: string;
    }
    
    interface WaybackUrlsResult {
      domain: string;
      totalUrls: number;
      urls: WaybackUrl[];
    }
  • Registration of the wayback_urls tool in the protocol layer.
    const waybackUrlsTool: ToolDef = {
      name: "wayback_urls",
      description: "Search Wayback Machine for archived URLs of a domain. Returns unique URLs with timestamps, status codes, and MIME types. Useful for finding old endpoints, hidden paths, and removed content.",
      schema: {
        domain: z.string().describe("Domain to search archived URLs for"),
        match_type: z.string().optional().describe("CDX match type (exact, prefix, host, domain)"),
        filter: z.string().optional().describe("CDX filter (e.g. 'statuscode:200', 'mimetype:text/html')"),
        limit: z.number().optional().describe("Maximum URLs to return (default: 1000)"),
      },
      execute: async (args) =>
        json(await waybackUrls(
          args.domain as string,
          args.match_type as string | undefined,
          args.filter as string | undefined,
          args.limit as number | undefined,
        )),
    };

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/badchars/osint-mcp-server'

If you have feedback or need assistance with the MCP directory API, please join our Discord server