README.mdβ’14.7 kB
# π Search Fusion MCP Server
[](https://opensource.org/licenses/MIT)
[](https://www.python.org/downloads/)
[](https://github.com/jlowin/fastmcp)
[](https://github.com/sailaoda/search-fusion-mcp/releases)
[](https://github.com/sailaoda/search-fusion-mcp)
**π [δΈζζζ‘£](README_zh.md)**
A **High-Availability Multi-Engine Search Aggregation MCP Server** providing intelligent failover, unified API, and LLM-optimized content processing. Search Fusion integrates multiple search engines with smart priority-based routing and automatic failover mechanisms.
> **π What's New in v3.0.0:** Major concurrency upgrade! Enhanced multi-threading support with thread-safe operations, intelligent connection pooling, and semaphore-based request limiting. Now supports 50+ concurrent searches without race conditions or data corruption!
## β¨ Features
### π Multi-Engine Integration
- **Google Search** - Premium performance with API key
- **Serper Search** - Google search alternative with advanced features
- **Jina AI Search** - AI-powered search with intelligent content processing
- **DuckDuckGo** - Free search, no API key required
- **Exa Search** - AI-powered semantic search
- **Bing Search** - Microsoft search API
- **Baidu Search** - Chinese search engine
### π Advanced Features
- **Intelligent Failover** - Automatic engine switching on failures or rate limits
- **Priority-Based Routing** - Smart engine selection based on availability and performance
- **Unified Response Format** - Consistent JSON structure across all engines
- **Rate Limiting Protection** - Built-in cooldown mechanisms
- **π High Concurrency Support** - Thread-safe operations with connection pooling
- **β‘ Performance Optimization** - Async operations with semaphore-based concurrency control
- **LLM-Optimized Content** - Advanced web content fetching with pagination support
- **Wikipedia Integration** - Dedicated Wikipedia search tool
- **Wayback Machine** - Historical webpage archive search
- **Environment Variable Configuration** - Pure MCP configuration without config files
- **π Enhanced Proxy Auto-Detection** - Intelligent proxy detection with zero configuration
### π Monitoring & Analytics
- Real-time engine status monitoring
- Success rate tracking
- Error handling and recovery
- Performance metrics
### β‘ Concurrency & Performance
- **Thread-Safe Operations** - All engine statistics and state updates are protected by async locks
- **Connection Pooling** - Shared HTTP client with configurable connection limits (max 100 connections)
- **Semaphore Control** - Concurrent request limiting (max 30 simultaneous searches)
- **Timeout Protection** - 60-second search timeout prevents request accumulation
- **Resource Management** - Efficient memory usage with automatic connection cleanup
- **Race Condition Prevention** - Double-checked locking for SearchManager initialization
## ποΈ Architecture
```
Search Fusion MCP Server
βββ π§ Configuration Manager # MCP environment variable handling
βββ π Search Manager # Multi-engine orchestration with concurrency control
βββ β‘ Concurrency Layer # Thread-safe operations & performance optimization
β βββ AsyncLock Protection # Thread-safe state updates
β βββ HTTP Connection Pool # Shared client with connection limits
β βββ Semaphore Control # Concurrent request limiting (max 30)
β βββ Timeout Management # 60s timeout protection
βββ π Engine Implementations # Individual search engines
β βββ GoogleSearch # Google Custom Search
β βββ SerperSearch # Serper API
β βββ JinaSearch # Jina AI Search
β βββ DuckDuckGoSearch # DuckDuckGo
β βββ ExaSearch # Exa AI
β βββ BingSearch # Bing API
β βββ BaiduSearch # Baidu API
βββ π οΈ Advanced Fetcher # Multi-method web scraping
βββ π‘ MCP Server # FastMCP integration
```
## π Quick Start
### Installation
#### Option 1: Install from PyPI (Recommended)
```bash
pip install search-fusion-mcp
```
#### Option 2: Install from Source
```bash
git clone https://github.com/sailaoda/search-fusion-mcp.git
cd search-fusion-mcp
pip install -e .
```
## π Enhanced Proxy Auto-Detection (New in v2.0!)
Search Fusion now features **intelligent proxy auto-detection** inspired by [concurrent-browser-mcp](https://github.com/sailaoda/concurrent-browser-mcp), providing seamless proxy support with **zero configuration**!
### β¨ Three-Layer Detection Strategy
1. **Environment Variables** - Highest priority, checks `HTTP_PROXY`, `HTTPS_PROXY`, `ALL_PROXY`
2. **Port Scanning** - Scans common proxy ports using socket connection testing
3. **System Proxy** - Detects OS-level proxy settings (macOS supported)
### π Supported Proxy Ports (Priority Order)
- **7890** - Clash default port
- **1087** - V2Ray common port
- **8080** - Generic HTTP proxy port
- **3128** - Squid proxy default port
- **8888** - Other proxy software port
- **10809** - V2Ray SOCKS port
- **20171** - Additional proxy port
### π Zero Configuration Usage
**Just run directly** - proxy will be auto-detected:
```bash
search-fusion-mcp
```
**Manual override** (if needed):
```bash
env HTTP_PROXY="http://your-proxy:port" search-fusion-mcp
```
### π Detection Process
```
π Checking environment variables...
π Scanning proxy ports: [7890, 1087, 8080, ...]
β
Local proxy port detected: 7890
π Auto-detected proxy: http://127.0.0.1:7890
```
### π Comparison with concurrent-browser-mcp
| Feature | Search-Fusion | concurrent-browser-mcp |
|---------|---------------|------------------------|
| **Detection Method** | β
Env vars β Port scan β System proxy | β
Same strategy |
| **Port List** | β
7 common ports | β
7 common ports |
| **Connection Test** | β
Socket testing | β
Socket testing |
| **Timeout** | β
3 seconds | β
3 seconds |
| **macOS Support** | β
networksetup | β
networksetup |
| **Language** | Python | TypeScript |
### MCP Integration
#### Environment Variable Configuration
Search Fusion uses **pure MCP environment variable configuration** without requiring config files.
**MCP Client Configuration (PyPI Installation):**
```json
{
"mcp": {
"mcpServers": {
"search-fusion": {
"command": "search-fusion-mcp",
"env": {
"GOOGLE_API_KEY": "your_google_api_key",
"GOOGLE_CSE_ID": "your_google_cse_id",
"SERPER_API_KEY": "your_serper_api_key",
"JINA_API_KEY": "your_jina_api_key",
"EXA_API_KEY": "your_exa_api_key",
"BING_API_KEY": "your_bing_api_key",
"BAIDU_API_KEY": "your_baidu_api_key",
"BAIDU_SECRET_KEY": "your_baidu_secret_key"
}
}
}
}
}
```
**MCP Client Configuration (Source Installation):**
```json
{
"mcp": {
"mcpServers": {
"search-fusion": {
"command": "python",
"args": ["-m", "src.main"],
"cwd": "/path/to/your/search-fusion-mcp",
"env": {
"GOOGLE_API_KEY": "your_google_api_key",
"GOOGLE_CSE_ID": "your_google_cse_id",
"SERPER_API_KEY": "your_serper_api_key",
"JINA_API_KEY": "your_jina_api_key",
"EXA_API_KEY": "your_exa_api_key",
"BING_API_KEY": "your_bing_api_key",
"BAIDU_API_KEY": "your_baidu_api_key",
"BAIDU_SECRET_KEY": "your_baidu_secret_key"
}
}
}
}
}
```
#### Supported Environment Variables
| Search Engine | Environment Variable | Required | Description | Get API Key |
|--------------|---------------------|----------|-------------|-------------|
| Google | `GOOGLE_API_KEY`<br>`GOOGLE_CSE_ID` | Both needed | Google Custom Search API | [Get API Key](https://developers.google.com/custom-search/v1/introduction) |
| Serper | `SERPER_API_KEY` | API key | Serper Google Search API | [Get API Key](https://serper.dev/) |
| Jina AI | `JINA_API_KEY` | API key | Jina AI Search API | [Get API Key](https://jina.ai/) |
| Bing | `BING_API_KEY` | API key | Microsoft Bing Search API | [Get API Key](https://www.microsoft.com/en-us/bing/apis/bing-web-search-api) |
| Baidu | `BAIDU_API_KEY`<br>`BAIDU_SECRET_KEY` | Both needed | Baidu Search API | [Get API Key](https://ai.baidu.com/) |
| Exa | `EXA_API_KEY` | API key | Exa AI Search API | [Get API Key](https://exa.ai/) |
| DuckDuckGo | None required | - | Free search, no API key needed | - |
**Alternative Variable Names:**
```bash
# Google
GOOGLE_SEARCH_API_KEY # Alternative to GOOGLE_API_KEY
GOOGLE_SEARCH_CSE_ID # Alternative to GOOGLE_CSE_ID
# Serper
SERPER_SEARCH_API_KEY # Alternative to SERPER_API_KEY
# Others follow similar pattern...
```
### Engine Priority
Search engines are prioritized automatically:
1. **Google Search** (Priority 1) - Premium performance with API key
2. **Serper Search** (Priority 1) - Google alternative with advanced features
3. **Jina AI Search** (Priority 1.5) - AI-powered search with optional API key for advanced features
4. **DuckDuckGo** (Priority 2) - Free, no API key required
5. **Exa Search** (Priority 2) - AI-powered search with API key
6. **Bing Search** (Priority 3) - Microsoft search API
7. **Baidu Search** (Priority 3) - Chinese search engine
## π οΈ MCP Tools

### 1. `search`
Perform web searches with intelligent engine selection and failover.
**Parameters:**
- `query` (required): Search query terms
- `num_results` (default: 10): Number of results to return
- `engine` (default: "auto"): Engine preference
- `"auto"`: Automatic engine selection (recommended)
- `"google"`: Prefer Google Search
- `"serper"`: Prefer Serper Search
- `"jina"`: Prefer Jina AI Search
- `"duckduckgo"`: Prefer DuckDuckGo
- `"exa"`: Prefer Exa Search
- `"bing"`: Prefer Bing Search
- `"baidu"`: Prefer Baidu Search
### 2. `fetch_url`
Fetch and process web content with intelligent pagination and multi-method fallback.
**Parameters:**
- `url` (required): Web URL to fetch
- `use_jina` (default: true): Whether to prioritize Jina Reader for LLM-optimized content
- `with_image_alt` (default: false): Whether to generate alt text for images
- `max_length` (default: 50000): Maximum content length per page (auto-paginate if exceeded)
- `page_number` (default: 1): Retrieve specific page from previously fetched content
**Features:**
- **Intelligent Multi-Method Fallback**: Tries Jina Reader β Serper Scrape β Direct HTTP
- **Automatic Pagination**: Splits large content into manageable pages
- **Concurrent-Safe Caching**: Unique page IDs prevent conflicts in high-concurrency scenarios
- **LLM-Optimized Content**: Clean markdown format optimized for AI processing
### 3. `get_available_engines`
Get current status and availability of all search engines.
### 4. `search_wikipedia`
Search Wikipedia articles for entities, people, places, concepts, etc.
**Parameters:**
- `entity` (required): Entity to search for
- `first_sentences` (default: 10): Number of sentences to return (0 for full content)
### 5. `search_archived_webpage`
Search archived versions of websites using Wayback Machine.
**Parameters:**
- `url` (required): Website URL to search
- `year` (optional): Target year
- `month` (optional): Target month
- `day` (optional): Target day
## π API Examples
### Basic Search
```python
# Automatic engine selection
result = await search("artificial intelligence trends 2024")
# Prefer specific engine
result = await search("machine learning", engine="google")
```
### Advanced Web Fetching
```python
# Fetch with intelligent pagination
result = await fetch_url("https://example.com/long-article")
# If content is paginated, get additional pages
if result.get("is_paginated"):
page_2 = await get_page(result["page_id"], 2)
```
### Wikipedia Search
```python
# Get Wikipedia summary
result = await search_wikipedia("Python programming language")
# Get full article
result = await search_wikipedia("Quantum computing", first_sentences=0)
```
## π§ͺ Development
### Development Setup
```bash
git clone https://github.com/sailaoda/search-fusion-mcp.git
cd search-fusion-mcp
pip install -r requirements.txt
pip install -e .
```
## π§ Configuration Guide
For detailed configuration instructions, see [MCP_CONFIG_GUIDE.md](MCP_CONFIG_GUIDE.md).
## π Performance
- **Latency**: Sub-second response times with caching
- **Availability**: 99.9% uptime with intelligent failover
- **Throughput**: Handles concurrent requests efficiently
- **Scalability**: Efficient resource utilization and concurrent processing
### π Concurrency Benchmarks
**Tested Performance (v3.0.0+):**
- β
**50+ concurrent searches** - No race conditions or data corruption
- β
**Thread-safe statistics** - Accurate request counting and error tracking
- β‘ **Connection pooling** - Efficient HTTP resource management
- π‘οΈ **Timeout protection** - 60s per request prevents system overload
- π **Real-time monitoring** - Live engine status during high load
**Recommended Limits:**
- **Concurrent searches**: 10 (configurable via semaphore)
- **Connection pool**: 100 max connections, 20 keep-alive
- **Request timeout**: 60 seconds
- **Memory usage**: ~50MB baseline + ~2MB per concurrent request
## π€ Contributing
1. Fork the repository
2. Create a feature branch
3. Make your changes
4. Add tests for new functionality
5. Submit a pull request
## π License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
## π¨ Rate Limiting & Best Practices
- **Google Search**: 100 queries/day (free tier)
- **Serper API**: Varies by plan
- **Jina AI**: Rate limits apply based on subscription
- **DuckDuckGo**: No official limits, but use responsibly
- **Other engines**: Check respective API documentation
Always implement appropriate delays and respect rate limits to ensure sustainable usage.
## π Support
- π [Documentation](https://github.com/sailaoda/search-fusion-mcp)
- π [Issue Tracker](https://github.com/sailaoda/search-fusion-mcp/issues)
- π¬ [Discussions](https://github.com/sailaoda/search-fusion-mcp/discussions)
---
**Made with β€οΈ for the MCP community**