Tools for extracting structured data from web pages for LLM use

Search for:

Tools for extracting structured data from web pages for LLM use

View all MCP Servers

Why this server?
Explicitly designed to scrape and extract structured data from any website globally, bypass anti-bot systems, render JavaScript content, and output the results in formats suitable for LLMs (Markdown, HTML, or Links).
Thordata MCP Server
Web Scraping Browser Automation
xja1023789-collab
-
license
-
quality
-
maintenance
Enables AI models to scrape and extract data from any website globally using Thordata's 195+ country proxy network. Bypasses anti-bot systems and renders JavaScript content, outputting structured data in Markdown, HTML, or Links format.
Last updated 2025-09-23
Why this server?
Enables comprehensive web scraping and crawling capabilities for LLMs, supporting both single-page extraction and multi-page crawling, with the ability to handle JavaScript rendering and output structured data.
AnyCrawl MCP Server
Web Scraping Browser Automation
any4ai
A
license
-
quality
C
maintenance
Enables web scraping and crawling capabilities for LLM clients, supporting single-page scraping, multi-page website crawling, and web search with multiple engines (Playwright, Cheerio, Puppeteer) and flexible output formats including markdown, HTML, text, and screenshots.
Last updated 2026-03-19
5
6
MIT
Why this server?
A powerful server focused on converting web content into structured data formats optimized for LLMs, facilitating deep web research and information structuring.
ScrapeGraph MCP Serverofficial
Web Scraping RAG Systems Browser Automation
ScrapeGraphAI
A
license
A
quality
B
maintenance
A production-ready Model Context Protocol server that enables language models to leverage AI-powered web scraping capabilities, offering tools for transforming webpages to markdown, extracting structured data, and executing AI-powered web searches.
Last updated 2026-07-17
8
89
MIT
Why this server?
Dedicated tool to extract structured data (JSON) from unstructured web content using natural language prompts, ideal for turning complex web pages into usable data points for LLMs.
Ashra Structured Data
getrupt
-
license
-
quality
C
maintenance
Extract structured data from any website with a simple SDK call. No scraping code, no headless browsers - just prompt and get JSON.
Last updated 2025-03-26
62
Why this server?
Allows LLMs to interact with web pages and extract data using Playwright's accessibility tree, ensuring deterministic and structured output based on UI elements rather than unreliable screen captures.
Playwright MCP
Browser Automation Web Scraping
mattreya
A
license
-
quality
D
maintenance
Enables LLMs to perform browser automation and web page interactions using Playwright's accessibility tree instead of screenshots. Provides fast, deterministic web automation through structured data without requiring vision models.
Last updated 2025-09-22
6,397,156
Apache 2.0
Why this server?
Uses specialized APIs (Tavily Search and Crawl) to perform complex web research, gathering and structuring data specifically for the purpose of creating high-quality, documented content for LLMs.
Deep Research MCP
Search Web Scraping RAG Systems
ali-kh7
-
license
B
quality
-
maintenance
A Model Context Protocol compliant server that facilitates comprehensive web research by utilizing Tavily's Search and Crawl APIs to gather and structure data for high-quality markdown document creation.
Last updated 2025-12-16
1
37
12
Why this server?
Designed for browser automation and web content extraction, it cleans and structures the extracted web data (including JavaScript content) for efficient use by local or external LLMs.
Low Cost Browsing MCP Server
Browser Automation Web Scraping RAG Systems
lcbro
F
license
-
quality
D
maintenance
Enables browser automation, web content extraction, and LLM-powered data transformation using Playwright. Supports session management, authentication flows, and works with local LLMs (Ollama, JAN AI) or external providers to clean and structure extracted web data.
Last updated 2025-09-14
36
6
Why this server?
Focuses on converting raw webpage HTML into clean, structured, and LLM-optimized Markdown by removing clutter (ads, navigation, headers), ensuring the LLM receives only core content.
Mozilla Readability Parser MCP
Web Scraping Browser Automation Agent Orchestration
emzimmer
A
license
A
quality
D
maintenance
Extracts and transforms webpage content into clean, LLM-optimized Markdown. Returns article title, main content, excerpt, byline and site name. Uses Mozilla's Readability algorithm to remove ads, navigation, footers and non-essential elements while preserving the core content structure.
Last updated 2025-01-28
1
16
17
MIT
Why this server?
Provides granular web scraping through CSS selectors, allowing the user (or LLM) to define exactly which structured elements (text, links, tables) should be extracted from a page.
MCP Web Scraper
Web Scraping Browser Automation Developer Tools
navin4078
F
license
-
quality
D
maintenance
A lightweight web scraping server that allows Claude Desktop users to extract various types of data from websites, including text, links, images, tables, headlines, and metadata using CSS selectors.
Last updated 2025-06-10
4

ScrapeGraph MCP Serverofficial