wayback_urls
Retrieve archived URLs from Wayback Machine to discover historical endpoints, hidden paths, and removed content for domain analysis.
Instructions
Search Wayback Machine for archived URLs of a domain. Returns unique URLs with timestamps, status codes, and MIME types. Useful for finding old endpoints, hidden paths, and removed content.
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| domain | Yes | Domain to search archived URLs for | |
| match_type | No | CDX match type (exact, prefix, host, domain) | |
| filter | No | CDX filter (e.g. 'statuscode:200', 'mimetype:text/html') | |
| limit | No | Maximum URLs to return (default: 1000) |
Implementation Reference
- src/wayback/index.ts:38-78 (handler)The handler function that executes the wayback_urls logic by querying the Wayback Machine CDX API.
export async function waybackUrls( domain: string, matchType?: string, filter?: string, limit = 1000, ): Promise<WaybackUrlsResult> { await limiter.acquire(); const params = new URLSearchParams({ url: `*.${domain}/*`, output: "json", fl: "original,timestamp,statuscode,mimetype", collapse: "urlkey", limit: String(limit), }); if (matchType) params.set("matchType", matchType); if (filter) params.set("filter", filter); const controller = new AbortController(); const timeout = setTimeout(() => controller.abort(), 30000); try { const res = await fetch(`https://web.archive.org/cdx/search/cdx?${params}`, { signal: controller.signal }); if (!res.ok) throw new Error(`Wayback CDX returned ${res.status}`); const data: string[][] = await res.json(); // First row is header: ["original", "timestamp", "statuscode", "mimetype"] const rows = data.slice(1); const urls: WaybackUrl[] = rows.map((row) => ({ url: row[0] ?? "", timestamp: row[1] ?? "", statusCode: row[2] ?? "", mimeType: row[3] ?? "", })); return { domain, totalUrls: urls.length, urls }; } finally { clearTimeout(timeout); } } - src/wayback/index.ts:7-18 (schema)Type definitions for the result of wayback_urls tool.
interface WaybackUrl { url: string; timestamp: string; statusCode: string; mimeType: string; } interface WaybackUrlsResult { domain: string; totalUrls: number; urls: WaybackUrl[]; } - src/protocol/tools.ts:375-391 (registration)Registration of the wayback_urls tool in the protocol layer.
const waybackUrlsTool: ToolDef = { name: "wayback_urls", description: "Search Wayback Machine for archived URLs of a domain. Returns unique URLs with timestamps, status codes, and MIME types. Useful for finding old endpoints, hidden paths, and removed content.", schema: { domain: z.string().describe("Domain to search archived URLs for"), match_type: z.string().optional().describe("CDX match type (exact, prefix, host, domain)"), filter: z.string().optional().describe("CDX filter (e.g. 'statuscode:200', 'mimetype:text/html')"), limit: z.number().optional().describe("Maximum URLs to return (default: 1000)"), }, execute: async (args) => json(await waybackUrls( args.domain as string, args.match_type as string | undefined, args.filter as string | undefined, args.limit as number | undefined, )), };