How to fetch or scrape data from a website for use in training an LLM

Search for:

How to fetch or scrape data from a website for use in training an LLM

View all MCP Servers

Why this server?
Provides functionality to fetch and transform web content in various formats (HTML, JSON, plain text, and Markdown) through simple API calls. Useful for fetching data from websites to feed into an LLM.
Fetch MCP Server
Web Scraping Browser Automation
phpmac
A
license
-
quality
C
maintenance
Provides functionality to fetch and transform web content in various formats (HTML, JSON, plain text, and Markdown) through simple API calls.
Last updated 2025-07-29
77,518
1
MIT
Why this server?
Offers comprehensive web content retrieval options (full webpage, filtered content, Markdown conversion), custom User-Agent, multi-HTTP method support, and LLM-controlled request headers, which allows you to retrieve precisely the web data you need for your LLM.
mcp-server-requests
Browser Automation Web Scraping Search
coucya
A
license
A
quality
D
maintenance
Web Content Retrieval (full webpage, filtered content, or Markdown-converted), Custom User-Agent, Multi-HTTP Method Support (GET/POST/PUT/DELETE/PATCH), LLM-Controlled Request Headers, LLM-Accessible Response Headers, and more.
Last updated 2025-11-21
3
7
MIT
Why this server?
Extracts and transforms webpage content into clean, LLM-optimized Markdown, removing ads and unnecessary elements. This server prepares web content effectively for use in LLMs.
Mozilla Readability Parser MCP
Web Scraping Browser Automation Agent Orchestration
emzimmer
A
license
A
quality
D
maintenance
Extracts and transforms webpage content into clean, LLM-optimized Markdown. Returns article title, main content, excerpt, byline and site name. Uses Mozilla's Readability algorithm to remove ads, navigation, footers and non-essential elements while preserving the core content structure.
Last updated 2025-01-28
1
16
17
MIT
Why this server?
This service extracts and transcribes audio content from videos across 1000+ streaming websites including YouTube, Bilibili, TikTok, and Twitter, supporting multiple transcription providers. This is a source for generating data for an LLM.
MCP Video Digest
Multimedia Processing Audio Processing Web Scraping
R-lz
A
license
-
quality
D
maintenance
A service that extracts and transcribes audio content from videos across 1000+ streaming websites including YouTube, Bilibili, TikTok, and Twitter, supporting multiple transcription providers like Deepgram, Gladia, Speechmatics, and AssemblyAI.
Last updated 2025-04-03
28
MIT
Why this server?
Facilitates searching and accessing programming resources across platforms like Stack Overflow, MDN, GitHub, npm, and PyPi, aiding LLMs in finding code examples and documentation which can be used as a data source or training material.
Code Research MCP Server
Search Documentation Access Developer Tools
jamesjohnsdev
A
license
A
quality
D
maintenance
Facilitates searching and accessing programming resources across platforms like Stack Overflow, MDN, GitHub, npm, and PyPI, aiding LLMs in finding code examples and documentation.
Last updated 2025-02-14
6
42
AGPL 3.0
Why this server?
This server allows you to search and retrieve content on any wiki site using MediaWiki. Wikipedia and fandom are supported. The content can then be used to train the LLM.
mediawiki-mcp-server
Search Documentation Access Open Data
shiquda
F
license
B
quality
C
maintenance
A MCP server that allows you to search and retrieve content on any wiki site using MediaWiki with LLMs 🤖. wikipedia.org, fandom.com, wiki.gg and more sites using Mediawiki are supported!
Last updated 2025-07-13
2
26
Why this server?
An MCP server paired with a Firefox extension that enables LLM clients to control the user's browser, supporting tab management, history search, and content reading.
Browser Control MCP
Browser Automation Agent Orchestration
eyalzh
A
license
-
quality
D
maintenance
An MCP server paired with a Firefox extension that enables LLM clients to control the user's browser, supporting tab management, history search, and content reading.
Last updated 2026-03-31
10
303
MIT
Why this server?
HTTP-4-MCP configuration tool allows you to easily convert HTTP API into MCP tool, writing the code written. Through simple interface operation, you can quickly configure an mcp-server.
http-4-mcp
API Testing App Automation
Tght1211
A
license
-
quality
D
maintenance
HTTP-4-MCP configuration tool allows you to easily convert HTTP API into MCP tool, writing the code written. Through simple interface operation, you can quickly configure an mcp-server.
Last updated 2025-05-04
16
Mulan Permissive Software , Version 2
Why this server?
A free, open-source service that transforms GitHub projects into MCP endpoints, enabling AI assistants to access and understand project documentation without any setup.
GitMCP
Developer Tools Documentation Access
idosal
A
license
C
quality
D
maintenance
A free, open-source service that transforms GitHub projects into MCP endpoints, enabling AI assistants to access and understand project documentation without any setup.
Last updated 2026-05-08
32
37
8,245
Apache 2.0