Skip to main content
Glama
sisilet

Wayback Machine MCP Server

by sisilet

get_snapshots

Retrieve archived web page snapshots from the Wayback Machine by specifying URL, date range, and matching criteria to access historical website content.

Instructions

Get a list of available Wayback Machine snapshots for a URL. Dates use YYYYMMDD, match_type is one of: exact, prefix, host, domain.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
from_No
limitNo
match_typeNoexact
toNo
urlYes

Implementation Reference

  • Registers the get_snapshots tool with FastMCP using @app.tool decorator, specifying name and description.
    @app.tool( name="get_snapshots", description=( "Get a list of available Wayback Machine snapshots for a URL. " "Dates use YYYYMMDD, match_type is one of: exact, prefix, host, domain." ), )
  • The main handler function for the get_snapshots tool. It builds CDX API parameters, fetches and parses the JSON response, processes snapshot rows into structured dicts, and returns the list of snapshots.
    async def get_snapshots( url: str, from_: Optional[str] = None, to: Optional[str] = None, limit: int = 100, match_type: Literal["exact", "prefix", "host", "domain"] = "exact", ) -> Dict[str, Any]: """ List snapshots using the CDX API. Returns a structured result with a normalized list. Parameter `from_` maps to `from` in the CDX API. """ params = _build_cdx_params(url, from_, to, limit, match_type) raw = await _fetch_json(CDX_ENDPOINT, params) if not isinstance(raw, list) or not raw: return {"url": url, "snapshots": [], "count": 0} headers = raw[0] rows = raw[1:] # Expected headers from CDX: urlkey,timestamp,original,mimetype,statuscode,digest,length index_by_name = {name: idx for idx, name in enumerate(headers)} results: List[Dict[str, Any]] = [] for row in rows: try: ts = row[index_by_name.get("timestamp", 1)] orig = row[index_by_name.get("original", 2)] mime = row[index_by_name.get("mimetype", 3)] status = row[index_by_name.get("statuscode", 4)] digest = row[index_by_name.get("digest", 5)] length = row[index_by_name.get("length", 6)] archived_url = f"{WAYBACK_ENDPOINT}/{ts}/{orig}" results.append( { "timestamp": ts, "original_url": orig, "mimetype": mime, "statuscode": status, "digest": digest, "length": length, "archived_url": archived_url, } ) except Exception: # Skip malformed rows continue return {"url": url, "snapshots": results, "count": len(results)}
  • Helper function used by get_snapshots to construct the parameters dictionary for the CDX API query.
    def _build_cdx_params( url: str, from_date: Optional[str], to_date: Optional[str], limit: int, match_type: Literal["exact", "prefix", "host", "domain"], ) -> Dict[str, Any]: params: Dict[str, Any] = { "url": url, "output": "json", "limit": str(limit), "matchType": match_type, # Clean results a bit: "filter": "statuscode:200", "collapse": "digest", } if from_date: params["from"] = from_date if to_date: params["to"] = to_date return params
  • Helper function used by get_snapshots to fetch JSON data from the CDX API endpoint.
    async def _fetch_json(url: str, params: Dict[str, Any]) -> Any: async with httpx.AsyncClient( headers={"User-Agent": USER_AGENT}, timeout=httpx.Timeout(20.0), follow_redirects=True, ) as client: resp = await client.get(url, params=params) resp.raise_for_status() return resp.json()

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/sisilet/wayback-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server