Spider MCP - Web Search Crawler Service
A web search MCP service based on pure crawler technology, built with Node.js.
Features
β No Official API Required: Completely based on crawler technology, no dependency on third-party official APIs
π Intelligent Search: Supports Bing web and news search
π° News Search: Built-in news search with time filtering
π·οΈ Pure Crawler: No official API dependency, uses Puppeteer for web scraping
π High Performance: Supports batch web scraping
π Health Monitoring: Complete health check and metrics monitoring
π Structured Logging: Uses Winston for structured logs
π Anti-Detection: Supports User-Agent rotation and other anti-bot measures
π Smart URL Cleaning: Automatically cleans promotional parameters while preserving essential information
Related MCP server: Puppeteer MCP Server
Tech Stack
Node.js (>= 18.0.0)
Express.js - Web framework
Puppeteer - Browser automation
Cheerio - HTML parsing
Axios - HTTP client
Winston - Logging
@modelcontextprotocol/sdk - MCP protocol support
Quick Start
1. Install dependencies
or use pnpm
2. Download Puppeteer browser
3. Environment configuration
Copy and configure the environment variables file:
Edit the .env file according to your needs.
4. Start the service
Development mode:
Production mode:
The service will start at http://localhost:3000.
MCP Tools
web_search
Unified search tool supporting both web and news search:
Web Search:
searchType: "web"News Search:
searchType: "news"with time filteringNote:
searchTypeis a required parameter and must be explicitly specified
Usage Examples:
Other Tools
get_webpage_content: Get webpage content and convert to specified formatget_webpage_source: Get raw HTML source code of webpagebatch_webpage_scrape: Batch scrape multiple webpages
MCP Configuration
Chatbox Configuration
Create mcp-config.json file in Chatbox:
Other MCP Clients
Important Notes
Anti-bot Measures: This service uses various techniques to avoid detection, but still needs to comply with robots.txt and terms of use
Rate Limiting: It's recommended to control request frequency reasonably to avoid putting pressure on target websites
Legal Compliance: Please ensure compliance with local laws and website terms of use when using this service
Resource Consumption: Puppeteer will start Chrome browser, please pay attention to memory and CPU usage
URL Cleaning: Automatically cleans promotional parameters but may affect some special link functionality
Development
Project Structure
License
MIT License
Contributing
Issues and Pull Requests are welcome!