Web scraping and crawling tools

Search for:

Web scraping and crawling tools

View all MCP Servers

Why this server?
This server is an excellent fit as it explicitly enables AI models to 'scrape and extract data from any website globally,' bypassing anti-bot systems, which is the core function of a web crawler.
Thordata MCP Server
Web Scraping Browser Automation
xja1023789-collab
-
license
-
quality
-
maintenance
Enables AI models to scrape and extract data from any website globally using Thordata's 195+ country proxy network. Bypasses anti-bot systems and renders JavaScript content, outputting structured data in Markdown, HTML, or Links format.
Last updated 2025-09-23
Why this server?
This server directly addresses the request by providing comprehensive 'web scraping and crawling capabilities for LLM clients' using tools like Playwright and Puppeteer.
AnyCrawl MCP Server
Web Scraping Browser Automation
any4ai
A
license
-
quality
C
maintenance
Enables web scraping and crawling capabilities for LLM clients, supporting single-page scraping, multi-page website crawling, and web search with multiple engines (Playwright, Cheerio, Puppeteer) and flexible output formats including markdown, HTML, text, and screenshots.
Last updated 2026-03-19
30
6
MIT
Why this server?
Focuses on AI-powered web scraping and content conversion into LLM-friendly Markdown, which aligns perfectly with the goal of building a web crawler tool.
ScrapeGraph MCP Serverofficial
Web Scraping RAG Systems Browser Automation
ScrapeGraphAI
A
license
A
quality
B
maintenance
A production-ready Model Context Protocol server that enables language models to leverage AI-powered web scraping capabilities, offering tools for transforming webpages to markdown, extracting structured data, and executing AI-powered web searches.
Last updated 2026-07-17
8
89
MIT
Why this server?
Explicitly defined as a 'web scraping server' supporting multiple export formats, making it a direct solution for a web crawler requirement.
Web Scraper MCP Server
Web Scraping Browser Automation
naku111
A
license
A
quality
D
maintenance
A TypeScript-based web scraping server built on the Model Context Protocol that offers multiple export formats, content extraction rules, and support for both static and dynamic (SPA) websites.
Last updated 2025-08-29
7
10
1
MIT
Why this server?
This server is designed for high-performance web crawling and information retrieval, integrating web content analysis for AI assistants.
Crawl4AI MCP Server
Browser Automation Search Web Scraping
weidwonder
A
license
-
quality
F
maintenance
Crawl4AI MCP Server is an intelligent information retrieval server offering robust search capabilities and LLM-optimized web content understanding, utilizing multi-engine search and intelligent content extraction to efficiently gather and comprehend internet information.
Last updated 2026-01-23
148
MIT
Why this server?
Uses the Playwright framework to enable headless 'browser automation and web page interactions,' a common technique used to scrape data from dynamic, JavaScript-heavy websites.
Playwright MCP
Browser Automation Web Scraping
mattreya
A
license
-
quality
D
maintenance
Enables LLMs to perform browser automation and web page interactions using Playwright's accessibility tree instead of screenshots. Provides fast, deterministic web automation through structured data without requiring vision models.
Last updated 2025-09-22
6,254,424
Apache 2.0
Why this server?
Specifically geared towards handling and retrieving data from various web crawler outputs (WARC, wget, etc.), confirming its relevance to web scraping/crawling tasks.
mcp-server-webcrawl
RAG Systems Search Web Scraping
pragmar
F
license
-
quality
C
maintenance
Bridge the gap between your web crawl and AI language models. With mcp-server-webcrawl, your AI client filters and analyzes web content under your direction or autonomously, extracting insights from your web content. Supports WARC, wget, InterroBot, Katana, and SiteOne crawlers.
Last updated 2026-05-31
44
Python
Why this server?
Scraping often results in messy HTML; this tool specializes in cleaning and transforming raw webpage content into usable, LLM-optimized Markdown.
Mozilla Readability Parser MCP
Web Scraping Browser Automation Agent Orchestration
emzimmer
A
license
A
quality
D
maintenance
Extracts and transforms webpage content into clean, LLM-optimized Markdown. Returns article title, main content, excerpt, byline and site name. Uses Mozilla's Readability algorithm to remove ads, navigation, footers and non-essential elements while preserving the core content structure.
Last updated 2025-01-28
1
74
17
MIT
Why this server?
Covers both browser automation and web reverse engineering, necessary skills for sophisticated web crawling and data extraction.
WebScout MCP
Browser Automation Web Scraping App Automation
pyscout
A
license
A
quality
D
maintenance
Enables reverse engineering of web applications and chat interfaces through browser automation, network traffic capture, and streaming API discovery. Provides comprehensive tools for analyzing network patterns, capturing streaming responses, and automating complex web interactions.
Last updated 2025-10-02
14
29
1
ISC

ScrapeGraph MCP Serverofficial