Search for:

A tool for extracting text from a webpage after crawling it

  • Why this server?

    This server fetches web content, which is the first step in extracting text from a webpage. It supports various HTTP methods and content formats.

    -
    security
    A
    license
    -
    quality
    An MCP server that enables fetching web content using the Node.js undici library, supporting various HTTP methods, content formats, and request configurations.
    66
    8
    TypeScript
    MIT License
    • Apple
    • Linux
  • Why this server?

    This server acts as a web browser for LLMs, crawling webpages similar to web search in ChatGPT, making it suitable for the crawling aspect.

    A
    security
    A
    license
    A
    quality
    Implementation of an MCP server for the RAG Web Browser Actor. This Actor serves as a web browser for large language models (LLMs) and RAG pipelines, similar to a web search in ChatGPT.
    1
    330
    77
    JavaScript
    Apache 2.0
    • Apple
  • Why this server?

    This MCP server offers a unified access to multiple search engines and content processing services, useful for both crawling and processing of webpage content.

    A
    security
    A
    license
    A
    quality
    🔍 A Model Context Protocol (MCP) server providing unified access to multiple search engines (Tavily, Brave, Kagi), AI tools (Perplexity, FastGPT), and content processing services (Jina AI, Kagi). Combines search, AI responses, content processing, and enhancement features through a single interface.
    15
    47
    30
    TypeScript
    MIT License
    • Linux
  • Why this server?

    Enables retrieval and processing of web page content for LLMs by converting HTML to markdown, with support for content truncation and pagination, making it suitable for extracting text from web pages after crawling.

    -
    security
    A
    license
    -
    quality
    Enables retrieval and processing of web page content for LLMs by converting HTML to markdown, with support for content truncation and pagination.
    1
    1
    Python
    MIT License
  • Why this server?

    Provides functionality to fetch web content in various formats, including HTML, JSON, plain text, and Markdown. Useful for both crawling and initial text extraction.

    A
    security
    F
    license
    A
    quality
    Provides functionality to fetch web content in various formats, including HTML, JSON, plain text, and Markdown.
    4
    137,083
    150
    TypeScript
  • Why this server?

    This server enables LLMs to retrieve and process content from web pages, converting HTML to markdown for easier consumption. Useful for text extraction.

    A
    security
    A
    license
    A
    quality
    This server enables LLMs to retrieve and process content from web pages, converting HTML to markdown for easier consumption.
    1
    37,968
    JavaScript
    MIT License
  • Why this server?

    Extracts webpage content, removes ads and non-essential elements, and transforms it into clean, LLM-optimized Markdown, helping with the extraction of text after crawling.

    -
    security
    A
    license
    -
    quality
    A Python implementation of an MCP server that extracts webpage content, removes ads and non-essential elements, and transforms it into clean, LLM-optimized Markdown.
    1
    Python
    MIT License
    • Linux
    • Apple
  • Why this server?

    This server enables users to download entire websites and their assets for offline access, which is effectively crawling, then the user can use text extraction tools.

    A
    security
    A
    license
    A
    quality
    This server enables users to download entire websites and their assets for offline access, supporting configurable depth and concurrency settings.
    1
    3
    Python
    MIT License
  • Why this server?

    It crawls website.

  • Why this server?

    A server that provides AgentQL's data extraction capabilities enabling AI agents to get structured data from unstructured web.

    A
    security
    A
    license
    A
    quality
    A server that provides AgentQL's data extraction capabilities enabling AI agents to get structured data from unstructured web
    1
    183
    28
    JavaScript
    MIT License
    • Apple
    • Linux