lookup_snapshots
Query the Wayback Machine for snapshots of a URL, with options to filter by date range, HTTP status code, and collapse duplicates based on timestamp or digest.
Instructions
Return CDX snapshots for a URL, with optional date range and status-code filter.
The Wayback Machine often crawls the same URL many times per day; raw CDX results would return one row per crawl. collapse is a server-side de-duplication: adjacent rows that share the same value in the chosen field get folded into a single representative row.
By default we collapse on the first 8 digits of the timestamp ("timestamp:8"), which is the YYYYMMDD prefix — i.e. one row per day. This is almost always what you want for "show me snapshots of this URL"; otherwise the default limit of 50 gets eaten by 50 captures from a single hour and you see nothing about the URL's history.
Override collapse when you need different granularity:
"digest"— collapse on content hash, so you only see captures where the page actually changed"timestamp:10"— one row per hour (first 10 digits of timestamp)""(empty string) — disable collapsing entirely; return every captureany other CDX collapse spec is passed through verbatim
latest=True uses CDX's fastLatest path to return the N most recent captures cheaply (much faster than a full scan over the index). Cannot be combined with from_date/to_date.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| url | Yes | ||
| from_date | No | ||
| to_date | No | ||
| status_code | No | ||
| limit | No | ||
| collapse | No | ||
| latest | No |