mcp-common-crawl
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@mcp-common-crawlfind expired domains in the marketing niche"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
mcp-common-crawl
Built by Artur Ferreira @ The GEO Lab ยท ๐ @TheGEO_Lab ยท LinkedIn ยท Reddit
MCP server for Common Crawl CDX โ backlink discovery, expired domain finder, competitor gap analysis. Free alternative to Ahrefs/Semrush backlink APIs ($100+/month).
Tools
Tool | Description |
| Find backlinks to any domain across 3 CC indexes |
| Search for expired/parked domains in a niche via CC CDX |
| Deep single domain check โ live/expired/parked + CC page count |
| Find domains linking to competitors but not to you |
Related MCP server: AgentWebSearch-MCP
Features
โ Production-tested โ patterns used in production at TheGEOLab
Install
# Claude Code
claude mcp add common-crawl -- npx mcp-common-crawl
# Or in .mcp.json
{
"mcpServers": {
"common-crawl": {
"command": "npx",
"args": ["mcp-common-crawl"]
}
}
}No API Keys Required
Common Crawl is a free, open web archive. No API keys, no rate limits, no paid tiers.
Usage
> find backlinks to thegeolab.net using Common Crawl
> search for expired domains in the "seo tools" niche
> check if example.com is expired or parked
> find link gap between my site and competitorsImportant Notes
Uses native
fetch()for CC CDX (axios returns 404 on CC CDX โ known issue)Queries the 3 most recent CC indexes for best coverage
Expired domain detection: ECONNREFUSED/ENOTFOUND = expired, parked page pattern matching for parked domains
Attributions & Licence
Built and maintained by Artur Ferreira @ TheGEOLab.
Email: artur@thegeolab.net
Best Practice Attribution
This MCP server was built following the open source Best Practice Approach โ reading community work for inspiration, then writing original content, and crediting every source.
Based on:
Model Context Protocol specification by Anthropic
MCP SDK (MIT)
Data source:
Common Crawl โ free, open web archive (non-profit)
Common Crawl CDX API โ index search endpoint
Backlink analysis concepts inspired by:
Ahrefs โ backlink discovery and competitor gap methodology
Semrush โ backlink analytics and domain comparison
Majestic โ historic backlink index concepts
Technical decisions:
Native
fetch()used instead of axios for CC CDX queries (axios returns 404 on CC CDX from inside Express โ persistent debugging issue documented in geolab-backlinks)
All server code is original writing. No files were copied or adapted from any source. MIT licence.
Found this useful? โญ Star the repo and connect: ๐ thegeolab.net ยท ๐ @TheGEO_Lab ยท LinkedIn ยท Reddit
Related Repos
claude-code-mcps โ All 5 MCP servers in one collection
mcp-seo-auditor โ On-page SEO audit + JSON-LD validation
mcp-serp-intel โ SERP weak spots, PAA trees, intent comparison
mcp-common-crawl โ Free backlink discovery via Common Crawl
mcp-gsc-advanced โ GSC cannibalization, rank changes
mcp-wordpress-setup โ WordPress MCP server setup guide
Licence
MIT โ see LICENSE
Built and maintained by Artur Ferreira @ TheGEOLab ยท MIT License
This server cannot be installed
Maintenance
Resources
Unclaimed servers have limited discoverability.
Looking for Admin?
If you are the server author, to access and configure the admin panel.
Latest Blog Posts
- Your AI Chatbot Just Exposed Your CEO's Salary to an InternBy Om-Shree-0709 on .Agent IdentityMCP SecurityOAuth Delegation
- Why MCP Servers Need Execution Sandboxing (And Why Your Current Stack Isn't Enough)By Om-Shree-0709 on .Agentic AiPrompt InjectionWebAssembly
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/arturseo-geo/mcp-common-crawl'
If you have feedback or need assistance with the MCP directory API, please join our Discord server