Skip to main content
Glama
sisilet

Wayback Machine MCP Server

by sisilet

get_snapshots

Retrieve archived web page snapshots from the Wayback Machine by specifying URL, date range, and matching criteria to access historical website content.

Instructions

Get a list of available Wayback Machine snapshots for a URL. Dates use YYYYMMDD, match_type is one of: exact, prefix, host, domain.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
from_No
limitNo
match_typeNoexact
toNo
urlYes

Implementation Reference

  • Registers the get_snapshots tool with FastMCP using @app.tool decorator, specifying name and description.
    @app.tool(
    	name="get_snapshots",
    	description=(
    		"Get a list of available Wayback Machine snapshots for a URL. "
    		"Dates use YYYYMMDD, match_type is one of: exact, prefix, host, domain."
    	),
    )
  • The main handler function for the get_snapshots tool. It builds CDX API parameters, fetches and parses the JSON response, processes snapshot rows into structured dicts, and returns the list of snapshots.
    async def get_snapshots(
    	url: str,
    	from_: Optional[str] = None,
    	to: Optional[str] = None,
    	limit: int = 100,
    	match_type: Literal["exact", "prefix", "host", "domain"] = "exact",
    ) -> Dict[str, Any]:
    	"""
    	List snapshots using the CDX API. Returns a structured result with a normalized list.
    	Parameter `from_` maps to `from` in the CDX API.
    	"""
    	params = _build_cdx_params(url, from_, to, limit, match_type)
    	raw = await _fetch_json(CDX_ENDPOINT, params)
    
    	if not isinstance(raw, list) or not raw:
    		return {"url": url, "snapshots": [], "count": 0}
    
    	headers = raw[0]
    	rows = raw[1:]
    
    	# Expected headers from CDX: urlkey,timestamp,original,mimetype,statuscode,digest,length
    	index_by_name = {name: idx for idx, name in enumerate(headers)}
    
    	results: List[Dict[str, Any]] = []
    	for row in rows:
    		try:
    			ts = row[index_by_name.get("timestamp", 1)]
    			orig = row[index_by_name.get("original", 2)]
    			mime = row[index_by_name.get("mimetype", 3)]
    			status = row[index_by_name.get("statuscode", 4)]
    			digest = row[index_by_name.get("digest", 5)]
    			length = row[index_by_name.get("length", 6)]
    			archived_url = f"{WAYBACK_ENDPOINT}/{ts}/{orig}"
    			results.append(
    				{
    					"timestamp": ts,
    					"original_url": orig,
    					"mimetype": mime,
    					"statuscode": status,
    					"digest": digest,
    					"length": length,
    					"archived_url": archived_url,
    				}
    			)
    		except Exception:
    			# Skip malformed rows
    			continue
    
    	return {"url": url, "snapshots": results, "count": len(results)}
  • Helper function used by get_snapshots to construct the parameters dictionary for the CDX API query.
    def _build_cdx_params(
    	url: str,
    	from_date: Optional[str],
    	to_date: Optional[str],
    	limit: int,
    	match_type: Literal["exact", "prefix", "host", "domain"],
    ) -> Dict[str, Any]:
    	params: Dict[str, Any] = {
    		"url": url,
    		"output": "json",
    		"limit": str(limit),
    		"matchType": match_type,
    		# Clean results a bit:
    		"filter": "statuscode:200",
    		"collapse": "digest",
    	}
    	if from_date:
    		params["from"] = from_date
    	if to_date:
    		params["to"] = to_date
    	return params
  • Helper function used by get_snapshots to fetch JSON data from the CDX API endpoint.
    async def _fetch_json(url: str, params: Dict[str, Any]) -> Any:
    	async with httpx.AsyncClient(
    		headers={"User-Agent": USER_AGENT},
    		timeout=httpx.Timeout(20.0),
    		follow_redirects=True,
    	) as client:
    		resp = await client.get(url, params=params)
    		resp.raise_for_status()
    		return resp.json()
Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/sisilet/wayback-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server