Web scraping tools and techniques

Glama

Search for:

Web scraping tools and techniques

View all MCP Servers

Why this server?
This server is an excellent fit as its primary function is to 'scrape and extract data from any website' globally, specifically mentioning bypassing anti-bot systems and rendering JavaScript content, which directly addresses the user's need for web scraping (网页爬取).
Thordata MCP Server
xja1023789-collab
-
security
-
license
-
quality
Enables AI models to scrape and extract data from any website globally using Thordata's 195+ country proxy network. Bypasses anti-bot systems and renders JavaScript content, outputting structured data in Markdown, HTML, or Links format.
Last updated 3 months ago
MIT License
Why this server?
This tool explicitly enables 'scraping and extraction' of data from websites, covering single-page scraping and multi-page crawling with rendering capabilities, making it a strong match for web scraping needs.
AnyCrawl MCP Server
any4ai
-
security
F
license
-
quality
Enables web scraping and crawling capabilities for LLM clients, supporting single-page scraping, multi-page website crawling, and web search with multiple engines (Playwright, Cheerio, Puppeteer) and flexible output formats including markdown, HTML, text, and screenshots.
Last updated 3 months ago
11
4
Why this server?
This server focuses on 'browser automation and web content extraction' using Playwright, a core technology for performing reliable web scraping tasks.
Low Cost Browsing MCP Server
lcbro
-
security
F
license
-
quality
Enables browser automation, web content extraction, and LLM-powered data transformation using Playwright. Supports session management, authentication flows, and works with local LLMs (Ollama, JAN AI) or external providers to clean and structure extracted web data.
Last updated 3 months ago
2
5
Why this server?
This server uses 'Tavily's Search and Crawl APIs to gather and structure data,' which aligns directly with the goal of web crawling and extracting information (网页爬取).
Deep Research MCP
ali-kh7
A
security
-
license
A
quality
A Model Context Protocol compliant server that facilitates comprehensive web research by utilizing Tavily's Search and Crawl APIs to gather and structure data for high-quality markdown document creation.
Last updated 2 hours ago
1
61
12
MIT License
Why this server?
A production-ready server that provides AI-powered 'web scraping capabilities,' transforming webpages to markdown and extracting structured data, which is highly relevant to the search query.
ScrapeGraph MCP Serverofficial
ScrapeGraphAI
A
security
A
license
A
quality
A production-ready Model Context Protocol server that enables language models to leverage AI-powered web scraping capabilities, offering tools for transforming webpages to markdown, extracting structured data, and executing AI-powered web searches.
Last updated 20 days ago
5
45
MIT License
Why this server?
This server specializes in extracting and transforming 'webpage content into clean, LLM-optimized Markdown,' a crucial step in preparing scraped data for analysis.
Mozilla Readability Parser MCP
emzimmer
A
security
A
license
A
quality
Extracts and transforms webpage content into clean, LLM-optimized Markdown. Returns article title, main content, excerpt, byline and site name. Uses Mozilla's Readability algorithm to remove ads, navigation, footers and non-essential elements while preserving the core content structure.
Last updated 10 months ago
1
35
14
MIT License
Why this server?
Enables 'reverse engineering of web applications' and interactions through browser automation, which are advanced techniques used for deep web data harvesting.
WebScout MCP
pyscout
A
security
F
license
A
quality
Enables reverse engineering of web applications and chat interfaces through browser automation, network traffic capture, and streaming API discovery. Provides comprehensive tools for analyzing network patterns, capturing streaming responses, and automating complex web interactions.
Last updated 2 months ago
14
8
2
Why this server?
This server enables LLMs to perform 'browser automation and web page interactions' using Playwright, a tool frequently used for web scraping and data extraction from dynamic sites.
Playwright MCP
mattreya
-
security
A
license
-
quality
Enables LLMs to perform browser automation and web page interactions using Playwright's accessibility tree instead of screenshots. Provides fast, deterministic web automation through structured data without requiring vision models.
Last updated 3 months ago
959,799
Apache 2.0
Why this server?
A versatile tool for generalized 'fetching content from URLs' (HTML, JSON, text), providing the basic necessary functionality for web data retrieval.
URL Fetch MCP
aelaguiz
A
security
A
license
A
quality
A Model Context Protocol (MCP) server that enables Claude or other LLMs to fetch content from URLs, supporting HTML, JSON, text, and images with configurable request parameters.
Last updated 9 months ago
3
2
MIT License

ScrapeGraph MCP Serverofficial