get_snapshots
Retrieve archived web page snapshots from the Wayback Machine by specifying URL, date range, and matching criteria to access historical website content.
Instructions
Get a list of available Wayback Machine snapshots for a URL. Dates use YYYYMMDD, match_type is one of: exact, prefix, host, domain.
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| from_ | No | ||
| limit | No | ||
| match_type | No | exact | |
| to | No | ||
| url | Yes |
Implementation Reference
- wayback_mcp/server.py:68-74 (registration)Registers the get_snapshots tool with FastMCP using @app.tool decorator, specifying name and description.@app.tool( name="get_snapshots", description=( "Get a list of available Wayback Machine snapshots for a URL. " "Dates use YYYYMMDD, match_type is one of: exact, prefix, host, domain." ), )
- wayback_mcp/server.py:75-123 (handler)The main handler function for the get_snapshots tool. It builds CDX API parameters, fetches and parses the JSON response, processes snapshot rows into structured dicts, and returns the list of snapshots.async def get_snapshots( url: str, from_: Optional[str] = None, to: Optional[str] = None, limit: int = 100, match_type: Literal["exact", "prefix", "host", "domain"] = "exact", ) -> Dict[str, Any]: """ List snapshots using the CDX API. Returns a structured result with a normalized list. Parameter `from_` maps to `from` in the CDX API. """ params = _build_cdx_params(url, from_, to, limit, match_type) raw = await _fetch_json(CDX_ENDPOINT, params) if not isinstance(raw, list) or not raw: return {"url": url, "snapshots": [], "count": 0} headers = raw[0] rows = raw[1:] # Expected headers from CDX: urlkey,timestamp,original,mimetype,statuscode,digest,length index_by_name = {name: idx for idx, name in enumerate(headers)} results: List[Dict[str, Any]] = [] for row in rows: try: ts = row[index_by_name.get("timestamp", 1)] orig = row[index_by_name.get("original", 2)] mime = row[index_by_name.get("mimetype", 3)] status = row[index_by_name.get("statuscode", 4)] digest = row[index_by_name.get("digest", 5)] length = row[index_by_name.get("length", 6)] archived_url = f"{WAYBACK_ENDPOINT}/{ts}/{orig}" results.append( { "timestamp": ts, "original_url": orig, "mimetype": mime, "statuscode": status, "digest": digest, "length": length, "archived_url": archived_url, } ) except Exception: # Skip malformed rows continue return {"url": url, "snapshots": results, "count": len(results)}
- wayback_mcp/server.py:23-43 (helper)Helper function used by get_snapshots to construct the parameters dictionary for the CDX API query.def _build_cdx_params( url: str, from_date: Optional[str], to_date: Optional[str], limit: int, match_type: Literal["exact", "prefix", "host", "domain"], ) -> Dict[str, Any]: params: Dict[str, Any] = { "url": url, "output": "json", "limit": str(limit), "matchType": match_type, # Clean results a bit: "filter": "statuscode:200", "collapse": "digest", } if from_date: params["from"] = from_date if to_date: params["to"] = to_date return params
- wayback_mcp/server.py:46-55 (helper)Helper function used by get_snapshots to fetch JSON data from the CDX API endpoint.async def _fetch_json(url: str, params: Dict[str, Any]) -> Any: async with httpx.AsyncClient( headers={"User-Agent": USER_AGENT}, timeout=httpx.Timeout(20.0), follow_redirects=True, ) as client: resp = await client.get(url, params=params) resp.raise_for_status() return resp.json()