Skip to main content
Glama

Web Search MCP Server

by vishalkg
README.mdβ€’8.27 kB
# WebSearch MCP Server [![Python 3.12+](https://img.shields.io/badge/python-3.12+-blue.svg)](https://www.python.org/downloads/) [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT) [![Pylint Score](https://img.shields.io/badge/pylint-10.00/10-brightgreen)](https://pylint.org/) High-performance Model Context Protocol (MCP) server for web search and content extraction with intelligent fallback system. ## ✨ Features - **πŸš€ Fast**: Async implementation with parallel execution - **πŸ” Multi-Engine**: Google, Bing, DuckDuckGo, Startpage, Brave Search - **πŸ›‘οΈ Intelligent Fallbacks**: Googleβ†’Startpage, Bingβ†’DuckDuckGo, Brave (standalone) - **πŸ“„ Content Extraction**: Clean text extraction from web pages - **πŸ’Ύ Smart Caching**: LRU cache with compression and deduplication - **πŸ”‘ API Integration**: Google Custom Search, Brave Search APIs with quota management - **⚑ Resilient**: Automatic failover and comprehensive error handling ## πŸ“¦ Installation ### Production Use (Recommended) ```bash # Create virtual environment python -m venv ~/.websearch/venv source ~/.websearch/venv/bin/activate # Install from GitHub pip install git+https://github.com/vishalkg/web-search.git ``` ### Development ```bash git clone https://github.com/vishalkg/web-search.git cd web-search pip install -e . ``` ## βš™οΈ Configuration ### Q CLI ```bash # Add to Q CLI (after installation) q mcp add --name websearch --command ~/.websearch/venv/bin/websearch-server # Test q chat "search for python tutorials" ``` ### Claude Desktop Add to your MCP settings file: ```bash claude mcp add websearch ~/.websearch/venv/bin/websearch-server -s user ``` ## πŸ—‚οΈ File Structure (Installation Independent) The server automatically creates and manages files in a unified user directory: ``` ~/.websearch/ # Single websearch directory β”œβ”€β”€ venv/ # Virtual environment (recommended) β”œβ”€β”€ config/ β”‚ └── .env # Configuration file β”œβ”€β”€ data/ β”‚ β”œβ”€β”€ search-metrics.jsonl # Search analytics β”‚ └── quota/ # API quota tracking β”‚ β”œβ”€β”€ google_quota.json β”‚ └── brave_quota.json β”œβ”€β”€ logs/ β”‚ └── web-search.log # Application logs └── cache/ # Optional caching ``` ### Environment Variable Overrides - `WEBSEARCH_HOME`: Base directory (default: `~/.websearch`) - `WEBSEARCH_CONFIG_DIR`: Config directory override - `WEBSEARCH_LOG_DIR`: Log directory override ## πŸ”§ Usage The server provides two main tools with multiple search modes: ### Search Web ```python # Standard 5-engine search (backward compatible) search_web("quantum computing applications", num_results=10) # New 3-engine fallback search (optimized) search_web_fallback("machine learning tutorials", num_results=5) ``` **Search Engines:** - **Google Custom Search API** (with Startpage fallback) - **Bing** (with DuckDuckGo fallback) - **Brave Search API** (standalone) - **DuckDuckGo** (scraping) - **Startpage** (scraping) ### Fetch Page Content ```python # Extract clean text from URLs fetch_page_content("https://example.com") fetch_page_content(["https://site1.com", "https://site2.com"]) # Batch processing ``` ## πŸ—οΈ Architecture ``` websearch/ β”œβ”€β”€ core/ β”‚ β”œβ”€β”€ search.py # Sync search orchestration β”‚ β”œβ”€β”€ async_search.py # Async search orchestration β”‚ β”œβ”€β”€ fallback_search.py # 3-engine fallback system β”‚ β”œβ”€β”€ async_fallback_search.py # Async fallback system β”‚ β”œβ”€β”€ ranking.py # Quality-first result ranking β”‚ └── common.py # Shared utilities β”œβ”€β”€ engines/ β”‚ β”œβ”€β”€ google_api.py # Google Custom Search API β”‚ β”œβ”€β”€ brave_api.py # Brave Search API β”‚ β”œβ”€β”€ bing.py # Bing scraping β”‚ β”œβ”€β”€ duckduckgo.py # DuckDuckGo scraping β”‚ └── startpage.py # Startpage scraping β”œβ”€β”€ utils/ β”‚ β”œβ”€β”€ unified_quota.py # Unified API quota management β”‚ β”œβ”€β”€ deduplication.py # Result deduplication β”‚ β”œβ”€β”€ advanced_cache.py # Enhanced caching system β”‚ └── http.py # HTTP utilities └── server.py # FastMCP server ``` ## πŸ”§ Advanced Configuration ### Environment Variables ```bash # API Configuration export GOOGLE_CSE_API_KEY=your_google_api_key export GOOGLE_CSE_ID=your_google_cse_id export BRAVE_SEARCH_API_KEY=your_brave_api_key # Quota Management (Optional) export GOOGLE_DAILY_QUOTA=100 # Default: 100 requests/day export BRAVE_MONTHLY_QUOTA=2000 # Default: 2000 requests/month # Performance Tuning export WEBSEARCH_CACHE_SIZE=1000 export WEBSEARCH_TIMEOUT=10 export WEBSEARCH_LOG_LEVEL=INFO ``` ### How to Get API Keys #### Google Custom Search API 1. **API Key**: Go to https://developers.google.com/custom-search/v1/introduction and click "Get a Key" 2. **CSE ID**: Go to https://cse.google.com/cse/ and follow prompts to create a search engine #### Brave Search API 1. Go to [Brave Search API](https://api.search.brave.com/) 2. Sign up for a free account 3. Go to your dashboard 4. Copy the API key as `BRAVE_API_KEY` 5. Free tier: 2000 requests/month ### Quota Management - **Unified System**: Single quota manager for all APIs - **Google**: Daily quota (default 100 requests/day) - **Brave**: Monthly quota (default 2000 requests/month) - **Storage**: Quota files stored in `~/.websearch/` directory - **Auto-reset**: Quotas automatically reset at period boundaries - **Fallback**: Automatic fallback to scraping when quotas exhausted ### Search Modes - **Standard Mode**: Uses all 5 engines for maximum coverage - **Fallback Mode**: Uses 3 engines with intelligent fallbacks for efficiency - **API-First Mode**: Prioritizes API calls over scraping when keys available ## πŸ› Troubleshooting | Issue | Solution | |-------|----------| | No results | Check internet connection and logs | | API quota exhausted | System automatically falls back to scraping | | Google API errors | Verify `GOOGLE_CSE_API_KEY` and `GOOGLE_CSE_ID` | | Brave API errors | Check `BRAVE_SEARCH_API_KEY` and quota status | | Permission denied | `chmod +x start.sh` | | Import errors | Ensure Python 3.12+ and dependencies installed | | Circular import warnings | Fixed in v2.0+ (10.00/10 pylint score) | ### Debug Mode ```bash # Enable detailed logging export WEBSEARCH_LOG_LEVEL=DEBUG python -m websearch.server ``` ### API Status Check ```bash # Test API connectivity cd debug/ python test_brave_api.py # Test Brave API python test_fallback.py # Test fallback system ``` ## πŸ“ˆ Performance & Monitoring ### Metrics - **Pylint Score**: 10.00/10 (perfect code quality) - **Search Speed**: ~2-3 seconds for 5-engine search - **Fallback Speed**: ~1-2 seconds for 3-engine search - **Cache Hit Rate**: ~85% for repeated queries - **API Quota Efficiency**: Automatic fallback prevents service interruption ### Monitoring Logs are written to `web-search.log` with structured format: ```bash tail -f web-search.log | grep "search completed" ``` ## πŸ”’ Security - **No hardcoded secrets**: All API keys via environment variables - **Clean git history**: Secrets scrubbed from all commits - **Input validation**: Comprehensive sanitization of search queries - **Rate limiting**: Built-in quota management for API calls - **Secure defaults**: HTTPS-only requests, timeout protection ## πŸš€ Performance Tips 1. **Use fallback mode** for faster searches when you don't need maximum coverage 2. **Set API keys** to reduce reliance on scraping (faster + more reliable) 3. **Enable caching** for repeated queries (enabled by default) 4. **Tune batch sizes** for content extraction based on your needs ## 🀝 Contributing 1. Fork the repository 2. Create feature branch (`git checkout -b feature/amazing-feature`) 3. Run tests (`pytest`) 4. Commit changes (`git commit -m 'Add amazing feature'`) 5. Push to branch (`git push origin feature/amazing-feature`) 6. Open Pull Request ## πŸ“„ License MIT License - see [LICENSE](LICENSE) file for details.

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/vishalkg/web-search'

If you have feedback or need assistance with the MCP directory API, please join our Discord server