Skip to main content
Glama

Spider MCP

by Bosegluon2
README.md•4.57 kB
# Spider MCP - Web Search Crawler Service A web search MCP service based on pure crawler technology, built with Node.js. ## Features - āŒ **No Official API Required**: Completely based on crawler technology, no dependency on third-party official APIs - šŸ” **Intelligent Search**: Supports Bing web and news search - šŸ“° **News Search**: Built-in news search with time filtering - šŸ•·ļø **Pure Crawler**: No official API dependency, uses Puppeteer for web scraping - šŸš€ **High Performance**: Supports batch web scraping - šŸ“Š **Health Monitoring**: Complete health check and metrics monitoring - šŸ“ **Structured Logging**: Uses Winston for structured logs - šŸ”’ **Anti-Detection**: Supports User-Agent rotation and other anti-bot measures - šŸ”— **Smart URL Cleaning**: Automatically cleans promotional parameters while preserving essential information ## Tech Stack - **Node.js** (>= 18.0.0) - **Express.js** - Web framework - **Puppeteer** - Browser automation - **Cheerio** - HTML parsing - **Axios** - HTTP client - **Winston** - Logging - **@modelcontextprotocol/sdk** - MCP protocol support ## Quick Start ### 1. Install dependencies ```bash npm install ``` or use `pnpm` ```bash pnpm install ``` ### 2. Download Puppeteer browser ```bash npx puppeteer browsers install chrome ``` ### 3. Environment configuration Copy and configure the environment variables file: ```bash cp .env.example .env ``` Edit the `.env` file according to your needs. ### 4. Start the service Development mode: ```bash npm run dev ``` Production mode: ```bash npm start ``` The service will start at `http://localhost:3000`. ## MCP Tools ### web_search Unified search tool supporting both web and news search: - **Web Search**: `searchType: "web"` - **News Search**: `searchType: "news"` with time filtering - **Note**: `searchType` is a required parameter and must be explicitly specified #### Usage Examples: ``` # Web search Use web_search tool to search "Node.js tutorial" with searchType set to web, return 10 results # News search Use web_search tool to search "tech news" with searchType set to news, return 5 results from past 24 hours ``` ### Other Tools - `get_webpage_content`: Get webpage content and convert to specified format - `get_webpage_source`: Get raw HTML source code of webpage - `batch_webpage_scrape`: Batch scrape multiple webpages ## MCP Configuration ### Chatbox Configuration Create `mcp-config.json` file in Chatbox: ```json { "mcpServers": { "spider-mcp": { "command": "node", "args": ["src/mcp/server.js"], "env": { "NODE_ENV": "production" }, "description": "Spider MCP - Web search and webpage scraping tools", "capabilities": { "tools": {} } } } } ``` ### Other MCP Clients ```json { "mcpServers": { "spider-mcp": { "command": "node", "args": ["path/to/spider-mcp/src/mcp/server.js"] } } } ``` ## Important Notes 1. **Anti-bot Measures**: This service uses various techniques to avoid detection, but still needs to comply with robots.txt and terms of use 2. **Rate Limiting**: It's recommended to control request frequency reasonably to avoid putting pressure on target websites 3. **Legal Compliance**: Please ensure compliance with local laws and website terms of use when using this service 4. **Resource Consumption**: Puppeteer will start Chrome browser, please pay attention to memory and CPU usage 5. **URL Cleaning**: Automatically cleans promotional parameters but may affect some special link functionality ## Development ### Project Structure ```tree spider-mcp/ ā”œā”€ā”€ src/ │ ā”œā”€ā”€ index.js # Main entry file │ ā”œā”€ā”€ mcp/ │ │ └── server.js # MCP server │ ā”œā”€ā”€ routes/ # Route definitions │ │ ā”œā”€ā”€ search.js # Search routes │ │ └── health.js # Health check routes │ ā”œā”€ā”€ services/ # Business logic │ │ └── searchService.js # Search service │ └── utils/ # Utility functions │ └── logger.js # Logging utility ā”œā”€ā”€ logs/ # Log files directory ā”œā”€ā”€ tests/ # Test files ā”œā”€ā”€ package.json # Project configuration ā”œā”€ā”€ .env.example # Environment variables example ā”œā”€ā”€ mcp-config.json # MCP configuration example └── README.md # Project documentation ``` ## License MIT License ## Contributing Issues and Pull Requests are welcome!

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/Bosegluon2/spider-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server