Skip to main content
Glama

Web Search MCP Server

by vishalkg

WebSearch MCP Server

Python 3.12+ License: MIT Pylint Score

High-performance Model Context Protocol (MCP) server for web search and content extraction with intelligent fallback system.

✨ Features

  • πŸš€ Fast: Async implementation with parallel execution

  • πŸ” Multi-Engine: Google, Bing, DuckDuckGo, Startpage, Brave Search

  • πŸ›‘οΈ Intelligent Fallbacks: Googleβ†’Startpage, Bingβ†’DuckDuckGo, Brave (standalone)

  • πŸ“„ Content Extraction: Clean text extraction from web pages

  • πŸ’Ύ Smart Caching: LRU cache with compression and deduplication

  • πŸ”‘ API Integration: Google Custom Search, Brave Search APIs with quota management

  • ⚑ Resilient: Automatic failover and comprehensive error handling

πŸ“¦ Installation

Production Use (Recommended)

# Create virtual environment python -m venv ~/.websearch/venv source ~/.websearch/venv/bin/activate # Install from GitHub pip install git+https://github.com/vishalkg/web-search.git

Development

git clone https://github.com/vishalkg/web-search.git cd web-search pip install -e .

βš™οΈ Configuration

Q CLI

# Add to Q CLI (after installation) q mcp add --name websearch --command ~/.websearch/venv/bin/websearch-server # Test q chat "search for python tutorials"

Claude Desktop

Add to your MCP settings file:

claude mcp add websearch ~/.websearch/venv/bin/websearch-server -s user

πŸ—‚οΈ File Structure (Installation Independent)

The server automatically creates and manages files in a unified user directory:

~/.websearch/ # Single websearch directory β”œβ”€β”€ venv/ # Virtual environment (recommended) β”œβ”€β”€ config/ β”‚ └── .env # Configuration file β”œβ”€β”€ data/ β”‚ β”œβ”€β”€ search-metrics.jsonl # Search analytics β”‚ └── quota/ # API quota tracking β”‚ β”œβ”€β”€ google_quota.json β”‚ └── brave_quota.json β”œβ”€β”€ logs/ β”‚ └── web-search.log # Application logs └── cache/ # Optional caching

Environment Variable Overrides

  • WEBSEARCH_HOME: Base directory (default: ~/.websearch)

  • WEBSEARCH_CONFIG_DIR: Config directory override

  • WEBSEARCH_LOG_DIR: Log directory override

πŸ”§ Usage

The server provides two main tools with multiple search modes:

Search Web

# Standard 5-engine search (backward compatible) search_web("quantum computing applications", num_results=10) # New 3-engine fallback search (optimized) search_web_fallback("machine learning tutorials", num_results=5)

Search Engines:

  • Google Custom Search API (with Startpage fallback)

  • Bing (with DuckDuckGo fallback)

  • Brave Search API (standalone)

  • DuckDuckGo (scraping)

  • Startpage (scraping)

Fetch Page Content

# Extract clean text from URLs fetch_page_content("https://example.com") fetch_page_content(["https://site1.com", "https://site2.com"]) # Batch processing

πŸ—οΈ Architecture

websearch/ β”œβ”€β”€ core/ β”‚ β”œβ”€β”€ search.py # Sync search orchestration β”‚ β”œβ”€β”€ async_search.py # Async search orchestration β”‚ β”œβ”€β”€ fallback_search.py # 3-engine fallback system β”‚ β”œβ”€β”€ async_fallback_search.py # Async fallback system β”‚ β”œβ”€β”€ ranking.py # Quality-first result ranking β”‚ └── common.py # Shared utilities β”œβ”€β”€ engines/ β”‚ β”œβ”€β”€ google_api.py # Google Custom Search API β”‚ β”œβ”€β”€ brave_api.py # Brave Search API β”‚ β”œβ”€β”€ bing.py # Bing scraping β”‚ β”œβ”€β”€ duckduckgo.py # DuckDuckGo scraping β”‚ └── startpage.py # Startpage scraping β”œβ”€β”€ utils/ β”‚ β”œβ”€β”€ unified_quota.py # Unified API quota management β”‚ β”œβ”€β”€ deduplication.py # Result deduplication β”‚ β”œβ”€β”€ advanced_cache.py # Enhanced caching system β”‚ └── http.py # HTTP utilities └── server.py # FastMCP server

πŸ”§ Advanced Configuration

Environment Variables

# API Configuration export GOOGLE_CSE_API_KEY=your_google_api_key export GOOGLE_CSE_ID=your_google_cse_id export BRAVE_SEARCH_API_KEY=your_brave_api_key # Quota Management (Optional) export GOOGLE_DAILY_QUOTA=100 # Default: 100 requests/day export BRAVE_MONTHLY_QUOTA=2000 # Default: 2000 requests/month # Performance Tuning export WEBSEARCH_CACHE_SIZE=1000 export WEBSEARCH_TIMEOUT=10 export WEBSEARCH_LOG_LEVEL=INFO

How to Get API Keys

Google Custom Search API

  1. API Key: Go to https://developers.google.com/custom-search/v1/introduction and click "Get a Key"

  2. CSE ID: Go to https://cse.google.com/cse/ and follow prompts to create a search engine

Brave Search API

  1. Go to Brave Search API

  2. Sign up for a free account

  3. Go to your dashboard

  4. Copy the API key as BRAVE_API_KEY

  5. Free tier: 2000 requests/month

Quota Management

  • Unified System: Single quota manager for all APIs

  • Google: Daily quota (default 100 requests/day)

  • Brave: Monthly quota (default 2000 requests/month)

  • Storage: Quota files stored in ~/.websearch/ directory

  • Auto-reset: Quotas automatically reset at period boundaries

  • Fallback: Automatic fallback to scraping when quotas exhausted

Search Modes

  • Standard Mode: Uses all 5 engines for maximum coverage

  • Fallback Mode: Uses 3 engines with intelligent fallbacks for efficiency

  • API-First Mode: Prioritizes API calls over scraping when keys available

πŸ› Troubleshooting

Issue

Solution

No results

Check internet connection and logs

API quota exhausted

System automatically falls back to scraping

Google API errors

Verify

GOOGLE_CSE_API_KEY

and

GOOGLE_CSE_ID

Brave API errors

Check

BRAVE_SEARCH_API_KEY

and quota status

Permission denied

chmod +x start.sh

Import errors

Ensure Python 3.12+ and dependencies installed

Circular import warnings

Fixed in v2.0+ (10.00/10 pylint score)

Debug Mode

# Enable detailed logging export WEBSEARCH_LOG_LEVEL=DEBUG python -m websearch.server

API Status Check

# Test API connectivity cd debug/ python test_brave_api.py # Test Brave API python test_fallback.py # Test fallback system

πŸ“ˆ Performance & Monitoring

Metrics

  • Pylint Score: 10.00/10 (perfect code quality)

  • Search Speed: ~2-3 seconds for 5-engine search

  • Fallback Speed: ~1-2 seconds for 3-engine search

  • Cache Hit Rate: ~85% for repeated queries

  • API Quota Efficiency: Automatic fallback prevents service interruption

Monitoring

Logs are written to web-search.log with structured format:

tail -f web-search.log | grep "search completed"

πŸ”’ Security

  • No hardcoded secrets: All API keys via environment variables

  • Clean git history: Secrets scrubbed from all commits

  • Input validation: Comprehensive sanitization of search queries

  • Rate limiting: Built-in quota management for API calls

  • Secure defaults: HTTPS-only requests, timeout protection

πŸš€ Performance Tips

  1. Use fallback mode for faster searches when you don't need maximum coverage

  2. Set API keys to reduce reliance on scraping (faster + more reliable)

  3. Enable caching for repeated queries (enabled by default)

  4. Tune batch sizes for content extraction based on your needs

🀝 Contributing

  1. Fork the repository

  2. Create feature branch (git checkout -b feature/amazing-feature)

  3. Run tests (pytest)

  4. Commit changes (git commit -m 'Add amazing feature')

  5. Push to branch (git push origin feature/amazing-feature)

  6. Open Pull Request

πŸ“„ License

MIT License - see LICENSE file for details.

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/vishalkg/web-search'

If you have feedback or need assistance with the MCP directory API, please join our Discord server