Spider MCP - Web Search Crawler Service
A web search MCP service based on pure crawler technology, built with Node.js.
Features
- ❌ No Official API Required: Completely based on crawler technology, no dependency on third-party official APIs
- 🔍 Intelligent Search: Supports Bing web and news search
- 📰 News Search: Built-in news search with time filtering
- 🕷️ Pure Crawler: No official API dependency, uses Puppeteer for web scraping
- 🚀 High Performance: Supports batch web scraping
- 📊 Health Monitoring: Complete health check and metrics monitoring
- 📝 Structured Logging: Uses Winston for structured logs
- 🔒 Anti-Detection: Supports User-Agent rotation and other anti-bot measures
- 🔗 Smart URL Cleaning: Automatically cleans promotional parameters while preserving essential information
Tech Stack
- Node.js (>= 18.0.0)
- Express.js - Web framework
- Puppeteer - Browser automation
- Cheerio - HTML parsing
- Axios - HTTP client
- Winston - Logging
- @modelcontextprotocol/sdk - MCP protocol support
Quick Start
1. Install dependencies
or use pnpm
2. Download Puppeteer browser
3. Environment configuration
Copy and configure the environment variables file:
Edit the .env
file according to your needs.
4. Start the service
Development mode:
Production mode:
The service will start at http://localhost:3000
.
MCP Tools
web_search
Unified search tool supporting both web and news search:
- Web Search:
searchType: "web"
- News Search:
searchType: "news"
with time filtering - Note:
searchType
is a required parameter and must be explicitly specified
Usage Examples:
Other Tools
get_webpage_content
: Get webpage content and convert to specified formatget_webpage_source
: Get raw HTML source code of webpagebatch_webpage_scrape
: Batch scrape multiple webpages
MCP Configuration
Chatbox Configuration
Create mcp-config.json
file in Chatbox:
Other MCP Clients
Important Notes
- Anti-bot Measures: This service uses various techniques to avoid detection, but still needs to comply with robots.txt and terms of use
- Rate Limiting: It's recommended to control request frequency reasonably to avoid putting pressure on target websites
- Legal Compliance: Please ensure compliance with local laws and website terms of use when using this service
- Resource Consumption: Puppeteer will start Chrome browser, please pay attention to memory and CPU usage
- URL Cleaning: Automatically cleans promotional parameters but may affect some special link functionality
Development
Project Structure
License
MIT License
Contributing
Issues and Pull Requests are welcome!
This server cannot be installed
local-only server
The server can only run on the client's local machine because it depends on local resources.
Enables web searching and webpage scraping using pure crawler technology without requiring official APIs. Supports Bing web and news search, batch webpage scraping, and content extraction through Puppeteer automation.
Related MCP Servers
- AsecurityFlicenseAqualityA lightweight, stateless MCP server utilizing Puppeteer for web searches, returning structured JSON results, easily integratable with other MCP-enabled systems.Last updated -1633JavaScript
- AsecurityAlicenseAqualityEnables browser automation with Puppeteer, supporting navigation, form interactions, and connection to active Chrome instances for comprehensive web page interaction.Last updated -84,205192TypeScriptMIT License
- -securityFlicense-qualityAn MCP server that enhances Brave Search results by using Puppeteer to extract full webpage content and explore linked pages, enabling AI assistants to perform comprehensive web research with configurable depth.Last updated -233TypeScript
- -securityFlicense-qualityEnables intelligent web scraping through a browser automation tool that can search Google, navigate to webpages, and extract content from various websites including GitHub, Stack Overflow, and documentation sites.Last updated -1Python