Skip to main content
Glama

Spider MCP

by Bosegluon2
README.md4.57 kB
# Spider MCP - Web Search Crawler Service A web search MCP service based on pure crawler technology, built with Node.js. ## Features - ❌ **No Official API Required**: Completely based on crawler technology, no dependency on third-party official APIs - 🔍 **Intelligent Search**: Supports Bing web and news search - 📰 **News Search**: Built-in news search with time filtering - 🕷️ **Pure Crawler**: No official API dependency, uses Puppeteer for web scraping - 🚀 **High Performance**: Supports batch web scraping - 📊 **Health Monitoring**: Complete health check and metrics monitoring - 📝 **Structured Logging**: Uses Winston for structured logs - 🔒 **Anti-Detection**: Supports User-Agent rotation and other anti-bot measures - 🔗 **Smart URL Cleaning**: Automatically cleans promotional parameters while preserving essential information ## Tech Stack - **Node.js** (>= 18.0.0) - **Express.js** - Web framework - **Puppeteer** - Browser automation - **Cheerio** - HTML parsing - **Axios** - HTTP client - **Winston** - Logging - **@modelcontextprotocol/sdk** - MCP protocol support ## Quick Start ### 1. Install dependencies ```bash npm install ``` or use `pnpm` ```bash pnpm install ``` ### 2. Download Puppeteer browser ```bash npx puppeteer browsers install chrome ``` ### 3. Environment configuration Copy and configure the environment variables file: ```bash cp .env.example .env ``` Edit the `.env` file according to your needs. ### 4. Start the service Development mode: ```bash npm run dev ``` Production mode: ```bash npm start ``` The service will start at `http://localhost:3000`. ## MCP Tools ### web_search Unified search tool supporting both web and news search: - **Web Search**: `searchType: "web"` - **News Search**: `searchType: "news"` with time filtering - **Note**: `searchType` is a required parameter and must be explicitly specified #### Usage Examples: ``` # Web search Use web_search tool to search "Node.js tutorial" with searchType set to web, return 10 results # News search Use web_search tool to search "tech news" with searchType set to news, return 5 results from past 24 hours ``` ### Other Tools - `get_webpage_content`: Get webpage content and convert to specified format - `get_webpage_source`: Get raw HTML source code of webpage - `batch_webpage_scrape`: Batch scrape multiple webpages ## MCP Configuration ### Chatbox Configuration Create `mcp-config.json` file in Chatbox: ```json { "mcpServers": { "spider-mcp": { "command": "node", "args": ["src/mcp/server.js"], "env": { "NODE_ENV": "production" }, "description": "Spider MCP - Web search and webpage scraping tools", "capabilities": { "tools": {} } } } } ``` ### Other MCP Clients ```json { "mcpServers": { "spider-mcp": { "command": "node", "args": ["path/to/spider-mcp/src/mcp/server.js"] } } } ``` ## Important Notes 1. **Anti-bot Measures**: This service uses various techniques to avoid detection, but still needs to comply with robots.txt and terms of use 2. **Rate Limiting**: It's recommended to control request frequency reasonably to avoid putting pressure on target websites 3. **Legal Compliance**: Please ensure compliance with local laws and website terms of use when using this service 4. **Resource Consumption**: Puppeteer will start Chrome browser, please pay attention to memory and CPU usage 5. **URL Cleaning**: Automatically cleans promotional parameters but may affect some special link functionality ## Development ### Project Structure ```tree spider-mcp/ ├── src/ │ ├── index.js # Main entry file │ ├── mcp/ │ │ └── server.js # MCP server │ ├── routes/ # Route definitions │ │ ├── search.js # Search routes │ │ └── health.js # Health check routes │ ├── services/ # Business logic │ │ └── searchService.js # Search service │ └── utils/ # Utility functions │ └── logger.js # Logging utility ├── logs/ # Log files directory ├── tests/ # Test files ├── package.json # Project configuration ├── .env.example # Environment variables example ├── mcp-config.json # MCP configuration example └── README.md # Project documentation ``` ## License MIT License ## Contributing Issues and Pull Requests are welcome!

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/Bosegluon2/spider-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server