README.mdβ’8.27 kB
# WebSearch MCP Server
[](https://www.python.org/downloads/)
[](https://opensource.org/licenses/MIT)
[](https://pylint.org/)
High-performance Model Context Protocol (MCP) server for web search and content extraction with intelligent fallback system.
## β¨ Features
- **π Fast**: Async implementation with parallel execution
- **π Multi-Engine**: Google, Bing, DuckDuckGo, Startpage, Brave Search
- **π‘οΈ Intelligent Fallbacks**: GoogleβStartpage, BingβDuckDuckGo, Brave (standalone)
- **π Content Extraction**: Clean text extraction from web pages
- **πΎ Smart Caching**: LRU cache with compression and deduplication
- **π API Integration**: Google Custom Search, Brave Search APIs with quota management
- **β‘ Resilient**: Automatic failover and comprehensive error handling
## π¦ Installation
### Production Use (Recommended)
```bash
# Create virtual environment
python -m venv ~/.websearch/venv
source ~/.websearch/venv/bin/activate
# Install from GitHub
pip install git+https://github.com/vishalkg/web-search.git
```
### Development
```bash
git clone https://github.com/vishalkg/web-search.git
cd web-search
pip install -e .
```
## βοΈ Configuration
### Q CLI
```bash
# Add to Q CLI (after installation)
q mcp add --name websearch --command ~/.websearch/venv/bin/websearch-server
# Test
q chat "search for python tutorials"
```
### Claude Desktop
Add to your MCP settings file:
```bash
claude mcp add websearch ~/.websearch/venv/bin/websearch-server -s user
```
## ποΈ File Structure (Installation Independent)
The server automatically creates and manages files in a unified user directory:
```
~/.websearch/ # Single websearch directory
βββ venv/ # Virtual environment (recommended)
βββ config/
β βββ .env # Configuration file
βββ data/
β βββ search-metrics.jsonl # Search analytics
β βββ quota/ # API quota tracking
β βββ google_quota.json
β βββ brave_quota.json
βββ logs/
β βββ web-search.log # Application logs
βββ cache/ # Optional caching
```
### Environment Variable Overrides
- `WEBSEARCH_HOME`: Base directory (default: `~/.websearch`)
- `WEBSEARCH_CONFIG_DIR`: Config directory override
- `WEBSEARCH_LOG_DIR`: Log directory override
## π§ Usage
The server provides two main tools with multiple search modes:
### Search Web
```python
# Standard 5-engine search (backward compatible)
search_web("quantum computing applications", num_results=10)
# New 3-engine fallback search (optimized)
search_web_fallback("machine learning tutorials", num_results=5)
```
**Search Engines:**
- **Google Custom Search API** (with Startpage fallback)
- **Bing** (with DuckDuckGo fallback)
- **Brave Search API** (standalone)
- **DuckDuckGo** (scraping)
- **Startpage** (scraping)
### Fetch Page Content
```python
# Extract clean text from URLs
fetch_page_content("https://example.com")
fetch_page_content(["https://site1.com", "https://site2.com"]) # Batch processing
```
## ποΈ Architecture
```
websearch/
βββ core/
β βββ search.py # Sync search orchestration
β βββ async_search.py # Async search orchestration
β βββ fallback_search.py # 3-engine fallback system
β βββ async_fallback_search.py # Async fallback system
β βββ ranking.py # Quality-first result ranking
β βββ common.py # Shared utilities
βββ engines/
β βββ google_api.py # Google Custom Search API
β βββ brave_api.py # Brave Search API
β βββ bing.py # Bing scraping
β βββ duckduckgo.py # DuckDuckGo scraping
β βββ startpage.py # Startpage scraping
βββ utils/
β βββ unified_quota.py # Unified API quota management
β βββ deduplication.py # Result deduplication
β βββ advanced_cache.py # Enhanced caching system
β βββ http.py # HTTP utilities
βββ server.py # FastMCP server
```
## π§ Advanced Configuration
### Environment Variables
```bash
# API Configuration
export GOOGLE_CSE_API_KEY=your_google_api_key
export GOOGLE_CSE_ID=your_google_cse_id
export BRAVE_SEARCH_API_KEY=your_brave_api_key
# Quota Management (Optional)
export GOOGLE_DAILY_QUOTA=100 # Default: 100 requests/day
export BRAVE_MONTHLY_QUOTA=2000 # Default: 2000 requests/month
# Performance Tuning
export WEBSEARCH_CACHE_SIZE=1000
export WEBSEARCH_TIMEOUT=10
export WEBSEARCH_LOG_LEVEL=INFO
```
### How to Get API Keys
#### Google Custom Search API
1. **API Key**: Go to https://developers.google.com/custom-search/v1/introduction and click "Get a Key"
2. **CSE ID**: Go to https://cse.google.com/cse/ and follow prompts to create a search engine
#### Brave Search API
1. Go to [Brave Search API](https://api.search.brave.com/)
2. Sign up for a free account
3. Go to your dashboard
4. Copy the API key as `BRAVE_API_KEY`
5. Free tier: 2000 requests/month
### Quota Management
- **Unified System**: Single quota manager for all APIs
- **Google**: Daily quota (default 100 requests/day)
- **Brave**: Monthly quota (default 2000 requests/month)
- **Storage**: Quota files stored in `~/.websearch/` directory
- **Auto-reset**: Quotas automatically reset at period boundaries
- **Fallback**: Automatic fallback to scraping when quotas exhausted
### Search Modes
- **Standard Mode**: Uses all 5 engines for maximum coverage
- **Fallback Mode**: Uses 3 engines with intelligent fallbacks for efficiency
- **API-First Mode**: Prioritizes API calls over scraping when keys available
## π Troubleshooting
| Issue | Solution |
|-------|----------|
| No results | Check internet connection and logs |
| API quota exhausted | System automatically falls back to scraping |
| Google API errors | Verify `GOOGLE_CSE_API_KEY` and `GOOGLE_CSE_ID` |
| Brave API errors | Check `BRAVE_SEARCH_API_KEY` and quota status |
| Permission denied | `chmod +x start.sh` |
| Import errors | Ensure Python 3.12+ and dependencies installed |
| Circular import warnings | Fixed in v2.0+ (10.00/10 pylint score) |
### Debug Mode
```bash
# Enable detailed logging
export WEBSEARCH_LOG_LEVEL=DEBUG
python -m websearch.server
```
### API Status Check
```bash
# Test API connectivity
cd debug/
python test_brave_api.py # Test Brave API
python test_fallback.py # Test fallback system
```
## π Performance & Monitoring
### Metrics
- **Pylint Score**: 10.00/10 (perfect code quality)
- **Search Speed**: ~2-3 seconds for 5-engine search
- **Fallback Speed**: ~1-2 seconds for 3-engine search
- **Cache Hit Rate**: ~85% for repeated queries
- **API Quota Efficiency**: Automatic fallback prevents service interruption
### Monitoring
Logs are written to `web-search.log` with structured format:
```bash
tail -f web-search.log | grep "search completed"
```
## π Security
- **No hardcoded secrets**: All API keys via environment variables
- **Clean git history**: Secrets scrubbed from all commits
- **Input validation**: Comprehensive sanitization of search queries
- **Rate limiting**: Built-in quota management for API calls
- **Secure defaults**: HTTPS-only requests, timeout protection
## π Performance Tips
1. **Use fallback mode** for faster searches when you don't need maximum coverage
2. **Set API keys** to reduce reliance on scraping (faster + more reliable)
3. **Enable caching** for repeated queries (enabled by default)
4. **Tune batch sizes** for content extraction based on your needs
## π€ Contributing
1. Fork the repository
2. Create feature branch (`git checkout -b feature/amazing-feature`)
3. Run tests (`pytest`)
4. Commit changes (`git commit -m 'Add amazing feature'`)
5. Push to branch (`git push origin feature/amazing-feature`)
6. Open Pull Request
## π License
MIT License - see [LICENSE](LICENSE) file for details.