Search for:

How to fetch or scrape data from a website for use in training an LLM

  • Why this server?

    Provides functionality to fetch and transform web content in various formats (HTML, JSON, plain text, and Markdown) through simple API calls. Useful for fetching data from websites to feed into an LLM.

    -
    security
    F
    license
    -
    quality
    Provides functionality to fetch and transform web content in various formats (HTML, JSON, plain text, and Markdown) through simple API calls.
    Last updated -
    137,083
    TypeScript
  • Why this server?

    Offers comprehensive web content retrieval options (full webpage, filtered content, Markdown conversion), custom User-Agent, multi-HTTP method support, and LLM-controlled request headers, which allows you to retrieve precisely the web data you need for your LLM.

    -
    security
    A
    license
    -
    quality
    Web Content Retrieval (full webpage, filtered content, or Markdown-converted), Custom User-Agent, Multi-HTTP Method Support (GET/POST/PUT/DELETE/PATCH), LLM-Controlled Request Headers, LLM-Accessible Response Headers, and more.
    Last updated -
    1
    Python
    MIT License
  • Why this server?

    Extracts and transforms webpage content into clean, LLM-optimized Markdown, removing ads and unnecessary elements. This server prepares web content effectively for use in LLMs.

    A
    security
    A
    license
    A
    quality
    Extracts and transforms webpage content into clean, LLM-optimized Markdown. Returns article title, main content, excerpt, byline and site name. Uses Mozilla's Readability algorithm to remove ads, navigation, footers and non-essential elements while preserving the core content structure.
    Last updated -
    1
    4
    11
    MIT License
  • Why this server?

    This service extracts and transcribes audio content from videos across 1000+ streaming websites including YouTube, Bilibili, TikTok, and Twitter, supporting multiple transcription providers. This is a source for generating data for an LLM.

    -
    security
    A
    license
    -
    quality
    A service that extracts and transcribes audio content from videos across 1000+ streaming websites including YouTube, Bilibili, TikTok, and Twitter, supporting multiple transcription providers like Deepgram, Gladia, Speechmatics, and AssemblyAI.
    Last updated -
    5
    Python
    MIT License
    • Linux
    • Apple
  • Why this server?

    Facilitates searching and accessing programming resources across platforms like Stack Overflow, MDN, GitHub, npm, and PyPi, aiding LLMs in finding code examples and documentation which can be used as a data source or training material.

    A
    security
    A
    license
    A
    quality
    Facilitates searching and accessing programming resources across platforms like Stack Overflow, MDN, GitHub, npm, and PyPI, aiding LLMs in finding code examples and documentation.
    Last updated -
    6
    25
    JavaScript
    AGPL 3.0
    • Apple
  • Why this server?

    This server allows you to search and retrieve content on any wiki site using MediaWiki. Wikipedia and fandom are supported. The content can then be used to train the LLM.

    A
    security
    F
    license
    A
    quality
    A MCP server that allows you to search and retrieve content on any wiki site using MediaWiki with LLMs 🤖. wikipedia.org, fandom.com, wiki.gg and more sites using Mediawiki are supported!
    Last updated -
    2
    1
    Python
  • Why this server?

    An MCP server paired with a Firefox extension that enables LLM clients to control the user's browser, supporting tab management, history search, and content reading.

    -
    security
    A
    license
    -
    quality
    An MCP server paired with a Firefox extension that enables LLM clients to control the user's browser, supporting tab management, history search, and content reading.
    Last updated -
    17
    TypeScript
    MIT License
  • Why this server?

    HTTP-4-MCP configuration tool allows you to easily convert HTTP API into MCP tool, writing the code written. Through simple interface operation, you can quickly configure an mcp-server.

    -
    security
    A
    license
    -
    quality
    HTTP-4-MCP configuration tool allows you to easily convert HTTP API into MCP tool, writing the code written. Through simple interface operation, you can quickly configure an mcp-server.
    Last updated -
    6
    JavaScript
    Mulan Permissive Software License, Version 2
    • Linux
    • Apple
  • Why this server?

    A free, open-source service that transforms GitHub projects into MCP endpoints, enabling AI assistants to access and understand project documentation without any setup.

    -
    security
    A
    license
    -
    quality
    A free, open-source service that transforms GitHub projects into MCP endpoints, enabling AI assistants to access and understand project documentation without any setup.
    Last updated -
    1,882
    TypeScript
    Apache 2.0
    • Apple
    • Linux
  • Why this server?

    Enables integration with Google Drive for listing, reading, and searching over files, supporting various file types with automatic export for Google Workspace files.

    -
    security
    A
    license
    -
    quality
    Enables integration with Google Drive for listing, reading, and searching over files, supporting various file types with automatic export for Google Workspace files.
    Last updated -
    1,495
    9
    JavaScript
    MIT License