Why this server?
This server is an excellent fit as it explicitly mentions 'web searching and webpage scraping using pure crawler technology', 'batch webpage scraping', and 'content extraction', which aligns directly with the information gathering and harvesting capabilities of 'theHarvester'.
AlicenseBqualityCmaintenanceEnables web searching and webpage scraping using pure crawler technology without requiring official APIs. Supports Bing web and news search, batch webpage scraping, and content extraction through Puppeteer automation.Last updated41MITWhy this server?
This server provides 'web scraping and crawling capabilities', supporting 'single-page scraping', 'multi-page website crawling', 'web search', and 'content extraction', making it highly relevant to the data collection function of 'theHarvester'.
Alicense-qualityBmaintenanceEnables web scraping and crawling capabilities for LLM clients, supporting single-page scraping, multi-page website crawling, and web search with multiple engines (Playwright, Cheerio, Puppeteer) and flexible output formats including markdown, HTML, text, and screenshots.Last updated66MITWhy this server?
As a 'web scraping server that allows... to extract various types of data from websites', this server directly matches the core functionality of 'theHarvester' for data acquisition.
Flicense-qualityCmaintenanceA lightweight web scraping server that allows Claude Desktop users to extract various types of data from websites, including text, links, images, tables, headlines, and metadata using CSS selectors.Last updated4Why this server?
This server offers 'advanced web scraping' and 'smart content extraction', which are key aspects of 'theHarvester's' data harvesting processes.
Alicense-qualityCmaintenanceProvides advanced web scraping with HTTP client, smart content extraction to Markdown, browser automation via Playwright, screenshot/PDF generation, and Docker sandbox execution environments.Last updated1MITWhy this server?
Explicitly named 'Scraper MCP' and described as a 'context-optimized web scraping server', it directly correlates with the scraping and data gathering activities performed by 'theHarvester'.
Alicense-qualityCmaintenanceA context-optimized web scraping server that converts HTML to markdown/text and applies CSS selectors server-side, reducing token usage by 70-90% while providing AI tools with clean, filtered web content.Last updated6MITWhy this server?
This server is a 'web scraping server' offering 'content extraction rules' for both static and dynamic websites, closely matching the functionality of 'theHarvester'.
AlicenseAqualityCmaintenanceA TypeScript-based web scraping server built on the Model Context Protocol that offers multiple export formats, content extraction rules, and support for both static and dynamic (SPA) websites.Last updated741MITWhy this server?
Described as a 'scraper tool' that fetches and processes web content with 'efficient content extraction', this server's capabilities are very similar to the data harvesting done by 'theHarvester'.

Oxylabs MCP Serverofficial
AlicenseCquality-maintenanceA scraper tool that leverages the Oxylabs Web Scraper API to fetch and process web content with flexible options for parsing and rendering pages, enabling efficient content extraction from complex websites.Last updated294Why this server?
While specific to QQ channels, its description highlights 'automated collection and downloading of multimedia content' and 'comprehensive media harvesting', making the term 'harvesting' directly relevant.
Alicense-qualityCmaintenanceEnables automated collection and downloading of multimedia content (images, GIFs, videos) from QQ channels. Features efficient video scraping, incremental updates, and intelligent fallback mechanisms for comprehensive media harvesting.Last updatedMITWhy this server?
This server is a 'powerful tool for fetching and extracting text content from web pages and APIs', supporting 'web scraping', which is a primary method used by 'theHarvester'.
AlicenseCqualityCmaintenanceA powerful tool for fetching and extracting text content from web pages and APIs, supporting web scraping, REST API requests, and Google Custom Search integration.Last updated510MIT