What can you do with this server?

This server gives LLM clients structured access to the Internet Archive's Wayback Machine and archive.org APIs, enabling historical web research, content retrieval, and archive exploration. * Check URL availability (check_availability): Determine if a URL has been archived and retrieve the closest snapshot to a given timestamp. * List snapshots (lookup_snapshots): Browse the full CDX snapshot history for a URL with optional date-range filtering, HTTP status filtering, de-duplication/collapsing, and a fast "latest N captures" mode. * Search Archive collections (search_archive): Search uploaded media items (books, audio, video, software, films) using Lucene query syntax, with filters for mediatype and publication year. * Discover archived URLs (search_domain): Find all archived URLs under a domain or path prefix — useful for mapping a site's crawl history. * Extract page text (get_snapshot_content): Fetch an archived web page and extract its readable text (stripping toolbars and boilerplate), returning text, word count, snapshot URL, and timestamp. * Retrieve item metadata (get_item_metadata): Get rich structured metadata for any Internet Archive item by its identifier (title, description, creator, file list, download counts, etc.). * Guided prompt workflows: Built-in prompts for researching a topic, tracking site changes over time, auditing link rot, and configuring API authentication. * Resource access: Access item metadata directly via the wayback://item/{identifier} URI template. Technical features include built-in rate limiting, caching, structured error handling, and optional Internet Archive S3 authentication for higher rate limits.

Which integrations are available for this server?

Provides structured access to the Internet Archive's Wayback Machine, allowing users to check URL availability, list snapshots, search archived items, retrieve page content, and fetch item metadata.

mcp-server-wayback

by lakshyamehta03

Overview Schema Related Servers Score Discussions

Python

Remote

wayback-mcp

A Model Context Protocol server giving Claude and other LLM clients structured access to the Internet Archive's Wayback Machine.

PyPI Python 3.11+ MCP License: MIT

Overview

wayback-mcp is an async Python MCP server that exposes the Internet Archive's six core APIs — Availability, CDX, Advanced Search, Metadata, and Wayback content — as first-class tools, prompts, and resources for any MCP-compatible client. It handles rate limiting, retry/back-off, and response shape normalisation so the model only sees structured Pydantic data.

Related MCP server: interdeep

Features

Six MCP tools covering availability checks, snapshot lookups, full-text item search, domain crawls, page-text extraction, and item metadata
Four guided prompts — research_topic, track_site_changes, audit_link_rot, setup_authentication
One MCP resource — wayback://item/{identifier} exposes IA item metadata as JSON
Async token-bucket rate limiter with per-endpoint buckets and Retry-After honoring
In-memory response cache with per-endpoint TTLs to keep token usage and IA load low
Internet Archive S3 authentication (optional) for higher rate-limit ceilings
Structured error model — expected failures return ToolError; unexpected ones raise
Tested against live IA APIs via an opt-in --integration pytest flag

Installation

As an MCP server

Interactive installer (recommended)

uvx mcp-server-wayback --install

You'll get a numbered menu of supported clients — pick one, the installer writes the config for you, then restart that client. Run uvx mcp-server-wayback --list-clients to see the menu without launching it.

Non-interactive installers

Pass the client key explicitly (handy for scripts and dotfiles):

uvx mcp-server-wayback --install claude-desktop
uvx mcp-server-wayback --install claude-code-user        # ~/.claude.json
uvx mcp-server-wayback --install claude-code-project     # ./.mcp.json in cwd
uvx mcp-server-wayback --install cursor                  # ./.cursor/mcp.json
uvx mcp-server-wayback --install windsurf
uvx mcp-server-wayback --install zed                     # uses Zed's context_servers key
uvx mcp-server-wayback --install antigravity             # ~/.gemini/antigravity/mcp_config.json

For clients with their own MCP CLI:

claude mcp add wayback -- uvx mcp-server-wayback
codex mcp add wayback -- uvx mcp-server-wayback

To include Internet Archive API keys for higher rate limits at install time:

claude mcp add wayback \
  --env WAYBACK_MCP_IA_ACCESS_KEY=xxx \
  --env WAYBACK_MCP_IA_SECRET_KEY=xxx \
  -- uvx mcp-server-wayback

Need uvx? brew install uv on macOS, or pipx install uv. Python 3.11+ required.

Manual configuration

For clients that use a JSON config file, add this to the appropriate section:

{
  "wayback": {
    "command": "uvx",
    "args": ["mcp-server-wayback"],
    "env": {
      "WAYBACK_MCP_IA_ACCESS_KEY": "your-access-key",
      "WAYBACK_MCP_IA_SECRET_KEY": "your-secret-key"
    }
  }
}

The env block is optional — the server works anonymously without credentials. See Authentication for details.

Client	Config file	Config key
Claude Desktop	`~/Library/Application Support/Claude/claude_desktop_config.json` (macOS)	`mcpServers`
Claude Code	`.mcp.json` (project) / `~/.claude.json` (user)	`mcpServers`
Google Antigravity	`~/.gemini/antigravity/mcp_config.json`	`mcpServers`
Codex CLI	`~/.codex/config.toml`	`[mcp_servers.wayback]`
Cursor	`.cursor/mcp.json`	`mcpServers`
Windsurf	`~/.codeium/windsurf/mcp_config.json`	`mcpServers`
Cline	`.cline/mcp.json`	`mcpServers`
Zed	`~/.config/zed/settings.json`	`context_servers`
Gemini CLI	`~/.gemini/settings.json`	`mcpServers`

Project-scoped (workspace) config

Claude Code supports a per-workspace .mcp.json in the repo root. Useful for testing env-var changes without touching your global config:

claude mcp add wayback --scope project -- uvx mcp-server-wayback

Open Claude Code from that folder — it picks up .mcp.json automatically. Add it to .gitignore if it contains real keys.

Uninstalling

uvx mcp-server-wayback --uninstall                  # interactive picker
uvx mcp-server-wayback --uninstall claude-desktop   # or pass a client key
claude mcp remove wayback                           # Claude Code native CLI
codex mcp remove wayback                            # Codex CLI native CLI

Quick examples

What to ask the agent once the server is wired up:

Has openai.com been archived? Show me the closest snapshot.

Find archived snapshots of nytimes.com from 2001.

What did anthropic.com look like in early 2023?

Search the Internet Archive for documentaries about the moon landing.

Walk me through how anthropic.com's homepage has changed over the past year.

I have a list of URLs from a 2015 reading list — check which are still recoverable from the Wayback Machine.

Or use a slash command for a guided workflow: /wayback:research_topic, /wayback:track_site_changes, /wayback:audit_link_rot, /wayback:setup_authentication.

Tools

`check_availability`

Check whether a URL has been archived and return the closest snapshot.

Parameter	Required	Description
`url`	Yes	The URL to check
`timestamp`	No	Target timestamp (`YYYYMMDDhhmmss`). Returns the snapshot closest to this point in time. Omit for the most recent.

`lookup_snapshots`

List all CDX snapshots for a URL with optional date-range and HTTP-status filters.

Parameter	Required	Description
`url`	Yes	The URL to look up
`from_date`	No	Start of range (`YYYYMMDD`)
`to_date`	No	End of range (`YYYYMMDD`)
`status_code`	No	Filter by HTTP status, e.g. `"200"` to drop redirects and errors
`limit`	No	Maximum results (defaults to `CDX_MAX_RESULTS` = 50)

`search_archive`

Search Internet Archive collections using Lucene query syntax. Returns matching items with identifier, title, mediatype, year, creator, subject, and download count.

Parameter	Required	Description
`query`	Yes	Lucene query, e.g. `"apollo 11"` or `creator:"NASA"`
`mediatype`	No	Filter by type: `"texts"`, `"audio"`, `"movies"`, `"image"`, `"software"`, `"web"`
`year_from`	No	Earliest publication year
`year_to`	No	Latest publication year
`limit`	No	Maximum results (defaults to `SEARCH_MAX_RESULTS` = 50)

`search_domain`

Discover archived URLs under a domain or path prefix. Auto-detects whether to do a wildcard-domain or prefix match from the input shape.

Parameter	Required	Description
`domain`	Yes	Bare domain (`example.com`) for subdomain wildcard, or `example.com/blog` for path prefix
`from_date`	No	Start of range (`YYYYMMDD`)
`to_date`	No	End of range (`YYYYMMDD`)
`status_code`	No	Filter by HTTP status
`limit`	No	Maximum results

`get_snapshot_content`

Fetch an archived web page and extract its readable text. Strips the Wayback toolbar, navigation, and boilerplate so the model only sees article-quality content.

Parameter	Required	Description
`url`	Yes	The URL to fetch the archived content of
`timestamp`	No	Target snapshot timestamp (`YYYYMMDDhhmmss`). Omit for the latest.

Returns {text, word_count, snapshot_url, timestamp, sparse_content_warning}.

`get_item_metadata`

Return rich structured metadata for any Internet Archive item by its identifier.

Parameter	Required	Description
`identifier`	Yes	The IA item identifier, e.g. `"nasa_Apollo_11"`

Returns title, description, creator, subject, mediatype, year, downloads, full file list, and more.

Prompts

Prompt	What it does
`research_topic`	Multi-mediatype IA search → synthesised topic overview
`track_site_changes`	Sample snapshots over time → narrate how a page evolved
`audit_link_rot`	Bulk-check URLs and surface archived alternatives
`setup_authentication`	Walks the user through configuring IA S3 keys

Resources

URI template	Returns
`wayback://item/{identifier}`	Full Internet Archive item metadata as JSON

Authentication

The server works anonymously by default. Configure Internet Archive S3 keys to raise your rate-limit ceiling and remove 429 errors during heavy use:

Visit https://archive.org/account/s3.php (free archive.org account required)
Copy your access key and secret key
Add them to the env block of your MCP config (see Manual configuration) — or run the setup_authentication prompt for an interactive walkthrough

Keys never leave your machine. They live only in your local MCP config and the server subprocess's environment.

Technical details

Transport: stdio (MCP client integration)
Caching: in-memory with per-endpoint TTLs
- Metadata, snapshot content: 24 hours (immutable once captured)
- CDX results: 1 hour (grows but never mutates)
- Search results: 15 minutes (relevance can shift)
Rate limiting: async token-bucket per endpoint group with automatic Retry-After handling for 429 responses
Validation: Pydantic 2 schemas for every input and output
Python 3.11+

Development

git clone https://github.com/lakshyamehta03/wayback-machine-mcp.git
cd wayback-machine-mcp
uv sync
uv run mcp-server-wayback      # run the server
uv run pytest                  # unit tests (httpx mocked via respx)
uv run pytest --integration    # also hit live Internet Archive APIs

CI runs the unit suite on every push and pull request via GitHub Actions.

Known issues

A couple of things to know:

Unparseable content types — some snapshots contain MIME types the agent can't extract text from (binaries, certain media). When that happens the server returns a structured error pointing you to the snapshot URL for manual review, rather than failing silently.
Flaky upstream endpoints — the Internet Archive's APIs (especially CDX) occasionally behave unexpectedly: timeouts, 503s, or degraded responses under load. The server retries with back-off and trips a circuit breaker, but a request may still fail and need a retry. Configuring Internet Archive API keys meaningfully improves success rates.

License

MIT. The Wayback Machine logo is © Internet Archive and used here under fair use to identify the upstream service this project integrates with.

Acknowledgments

The Internet Archive for the Wayback Machine and the open APIs that make this server possible
Anthropic for the Model Context Protocol specification and SDK

Install Server

license - permissive license

quality

maintenance

How are these scores calculated?

Maintenance

–Maintainers

4hResponse time

0dRelease cycle

3Releases (12mo)

Commit activity

Resources

GitHub Repository

Need Help?

Related Servers

Tools

Latest Blog Posts

Who's Calling? MCP Hosts Are an Identity Blind Spot (And the Spec Knows It)
By Om-Shree-0709 on July 25, 2026.
mcp
Agent Identity
OAuth 2.1
Your AI Chatbot Just Exposed Your CEO's Salary to an Intern
By Om-Shree-0709 on July 2, 2026.
Agent Identity
MCP Security
OAuth Delegation
Why MCP Servers Need Execution Sandboxing (And Why Your Current Stack Isn't Enough)
By Om-Shree-0709 on June 30, 2026.
Agentic Ai
Prompt Injection
WebAssembly

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/lakshyamehta03/wayback-machine-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server