WebSearch MCP Server

High-performance Model Context Protocol (MCP) server for web search and content extraction with intelligent fallback system.

✨ Features

🚀 Fast: Async implementation with parallel execution
🔍 Multi-Engine: Google, Bing, DuckDuckGo, Startpage, Brave Search
🛡️ Intelligent Fallbacks: Google→Startpage, Bing→DuckDuckGo, Brave (standalone)
📄 Content Extraction: Clean text extraction from web pages
💾 Smart Caching: LRU cache with compression and deduplication
🔑 API Integration: Google Custom Search, Brave Search APIs with quota management
⚡ Resilient: Automatic failover and comprehensive error handling

📦 Installation

Production Use (Recommended)

# Create virtual environment python -m venv ~/.websearch/venv source ~/.websearch/venv/bin/activate # Install from GitHub pip install git+https://github.com/vishalkg/web-search.git

Development

git clone https://github.com/vishalkg/web-search.git cd web-search pip install -e .

⚙️ Configuration

Q CLI

# Add to Q CLI (after installation) q mcp add --name websearch --command ~/.websearch/venv/bin/websearch-server # Test q chat "search for python tutorials"

Claude Desktop

Add to your MCP settings file:

claude mcp add websearch ~/.websearch/venv/bin/websearch-server -s user

🗂️ File Structure (Installation Independent)

The server automatically creates and manages files in a unified user directory:

~/.websearch/ # Single websearch directory ├── venv/ # Virtual environment (recommended) ├── config/ │ └── .env # Configuration file ├── data/ │ ├── search-metrics.jsonl # Search analytics │ └── quota/ # API quota tracking │ ├── google_quota.json │ └── brave_quota.json ├── logs/ │ └── web-search.log # Application logs └── cache/ # Optional caching

Environment Variable Overrides

WEBSEARCH_HOME: Base directory (default: ~/.websearch)
WEBSEARCH_CONFIG_DIR: Config directory override
WEBSEARCH_LOG_DIR: Log directory override

🔧 Usage

The server provides two main tools with multiple search modes:

Search Web

# Standard 5-engine search (backward compatible) search_web("quantum computing applications", num_results=10) # New 3-engine fallback search (optimized) search_web_fallback("machine learning tutorials", num_results=5)

Search Engines:

Google Custom Search API (with Startpage fallback)
Bing (with DuckDuckGo fallback)
Brave Search API (standalone)
DuckDuckGo (scraping)
Startpage (scraping)

Fetch Page Content

# Extract clean text from URLs fetch_page_content("https://example.com") fetch_page_content(["https://site1.com", "https://site2.com"]) # Batch processing

🏗️ Architecture

websearch/ ├── core/ │ ├── search.py # Sync search orchestration │ ├── async_search.py # Async search orchestration │ ├── fallback_search.py # 3-engine fallback system │ ├── async_fallback_search.py # Async fallback system │ ├── ranking.py # Quality-first result ranking │ └── common.py # Shared utilities ├── engines/ │ ├── google_api.py # Google Custom Search API │ ├── brave_api.py # Brave Search API │ ├── bing.py # Bing scraping │ ├── duckduckgo.py # DuckDuckGo scraping │ └── startpage.py # Startpage scraping ├── utils/ │ ├── unified_quota.py # Unified API quota management │ ├── deduplication.py # Result deduplication │ ├── advanced_cache.py # Enhanced caching system │ └── http.py # HTTP utilities └── server.py # FastMCP server

🔧 Advanced Configuration

Environment Variables

# API Configuration export GOOGLE_CSE_API_KEY=your_google_api_key export GOOGLE_CSE_ID=your_google_cse_id export BRAVE_SEARCH_API_KEY=your_brave_api_key # Quota Management (Optional) export GOOGLE_DAILY_QUOTA=100 # Default: 100 requests/day export BRAVE_MONTHLY_QUOTA=2000 # Default: 2000 requests/month # Performance Tuning export WEBSEARCH_CACHE_SIZE=1000 export WEBSEARCH_TIMEOUT=10 export WEBSEARCH_LOG_LEVEL=INFO

How to Get API Keys

Google Custom Search API

API Key: Go to https://developers.google.com/custom-search/v1/introduction and click "Get a Key"
CSE ID: Go to https://cse.google.com/cse/ and follow prompts to create a search engine

Brave Search API

Go to Brave Search API
Sign up for a free account
Go to your dashboard
Copy the API key as BRAVE_API_KEY
Free tier: 2000 requests/month

Quota Management

Unified System: Single quota manager for all APIs
Google: Daily quota (default 100 requests/day)
Brave: Monthly quota (default 2000 requests/month)
Storage: Quota files stored in ~/.websearch/ directory
Auto-reset: Quotas automatically reset at period boundaries
Fallback: Automatic fallback to scraping when quotas exhausted

Search Modes

Standard Mode: Uses all 5 engines for maximum coverage
Fallback Mode: Uses 3 engines with intelligent fallbacks for efficiency
API-First Mode: Prioritizes API calls over scraping when keys available

🐛 Troubleshooting

Issue	Solution
No results	Check internet connection and logs
API quota exhausted	System automatically falls back to scraping
Google API errors	Verify `GOOGLE_CSE_API_KEY` and `GOOGLE_CSE_ID`
Brave API errors	Check `BRAVE_SEARCH_API_KEY` and quota status
Permission denied	`chmod +x start.sh`
Import errors	Ensure Python 3.12+ and dependencies installed
Circular import warnings	Fixed in v2.0+ (10.00/10 pylint score)

Debug Mode

# Enable detailed logging export WEBSEARCH_LOG_LEVEL=DEBUG python -m websearch.server

API Status Check

# Test API connectivity cd debug/ python test_brave_api.py # Test Brave API python test_fallback.py # Test fallback system

📈 Performance & Monitoring

Metrics

Pylint Score: 10.00/10 (perfect code quality)
Search Speed: ~2-3 seconds for 5-engine search
Fallback Speed: ~1-2 seconds for 3-engine search
Cache Hit Rate: ~85% for repeated queries
API Quota Efficiency: Automatic fallback prevents service interruption

Monitoring

Logs are written to web-search.log with structured format:

tail -f web-search.log | grep "search completed"

🔒 Security

No hardcoded secrets: All API keys via environment variables
Clean git history: Secrets scrubbed from all commits
Input validation: Comprehensive sanitization of search queries
Rate limiting: Built-in quota management for API calls
Secure defaults: HTTPS-only requests, timeout protection

🚀 Performance Tips

Use fallback mode for faster searches when you don't need maximum coverage
Set API keys to reduce reliance on scraping (faster + more reliable)
Enable caching for repeated queries (enabled by default)
Tune batch sizes for content extraction based on your needs

🤝 Contributing

Fork the repository
Create feature branch (git checkout -b feature/amazing-feature)
Run tests (pytest)
Commit changes (git commit -m 'Add amazing feature')
Push to branch (git push origin feature/amazing-feature)
Open Pull Request

📄 License

MIT License - see LICENSE file for details.

Web Search MCP Server