Skip to main content
Glama
lalit9168

Web Search MCP Server

by lalit9168

Web Search MCP Server

A production-ready Model Context Protocol (MCP) Server that acts as a universal web search and content retrieval tool for AI agents.

Features

  • 🔍 Universal search — any topic, any language query

  • 🌐 Multi-provider — Tavily, Brave, Bing, SerpAPI, Google CSE (pluggable)

  • 📄 Content extraction — HTTP + BeautifulSoup (primary), Playwright (fallback for SPAs)

  • Async-first — parallel page fetching, connection pooling

  • 🗃️ Caching — in-memory TTL cache to save API quota

  • 🔄 Retry logic — exponential back-off via tenacity

  • 📊 Structured JSON — Pydantic v2 models, MCP-compliant output

  • 🪵 Structured logging — JSON or text format


Related MCP server: Crawl4AI+SearXNG MCP Server

Project Structure

web_search_mcp/
│
├── server.py        ← FastMCP server + tool registration
├── tools.py         ← Tool orchestration (search → extract → rank)
├── search.py        ← Pluggable search providers
├── extractor.py     ← HTML content extraction (BS4 + Playwright)
├── browser.py       ← Playwright browser manager
├── models.py        ← Pydantic data models
├── config.py        ← Settings (pydantic-settings + .env)
├── logger.py        ← Structured logging
├── utils.py         ← Shared helpers
├── requirements.txt
├── .env             ← Configuration (fill in your API keys)
└── README.md

Quick Start

1. Prerequisites

  • Python 3.11 or higher

  • pip

2. Install Dependencies

pip install -r requirements.txt

3. Install Playwright Browser

playwright install chromium

This downloads the Chromium binary (~130 MB). Required for JavaScript-heavy page extraction.

4. Configure API Keys

Edit .env and add at least one search provider key:

SEARCH_PROVIDER=tavily
TAVILY_API_KEY=your_key_here

Getting a free Tavily key (recommended):

  1. Visit app.tavily.com

  2. Sign up for a free account

  3. Copy your API key → paste into .env

5. Run the Server

python server.py

The server starts in STDIO mode (default), ready to connect with any MCP client.


MCP Tools

Search the web and return ranked snippets (no page visits).

Input:

{
  "query": "Latest AI trends in healthcare",
  "max_results": 10
}

Output:

{
  "query": "Latest AI trends in healthcare",
  "total_results": 10,
  "search_provider": "tavily",
  "results": [
    {
      "title": "AI in Healthcare 2025",
      "url": "https://example.com/ai-health",
      "domain": "example.com",
      "snippet": "Short summary of the article...",
      "content": "Same as snippet for web_search",
      "published_date": "2025-06-15",
      "relevance_score": 0.92
    }
  ],
  "cached": false,
  "execution_time_ms": 312.5
}

webpage_content

Extract full readable content from a specific URL.

Input:

{
  "url": "https://example.com/article",
  "use_browser": false
}

Set use_browser: true to force Playwright rendering for JavaScript-heavy pages.


search_and_extract

End-to-end: search → visit pages → extract content → rank results.

Input:

{
  "query": "Latest UK visa requirements 2025",
  "max_results": 5,
  "use_browser_fallback": true
}

Returns full page content for each result including title, author, publish date, and extracted text.


Search Providers

Provider

Env Key

Free Tier

Notes

Tavily

TAVILY_API_KEY

1,000/month

Best snippets, recommended

Brave

BRAVE_API_KEY

2,000/month

Privacy-focused

Bing

BING_API_KEY

1,000/month

Azure Cognitive Services

SerpAPI

SERPAPI_API_KEY

100/month

Proxies Google

Google CSE

GOOGLE_CSE_API_KEY + GOOGLE_CSE_ID

100/day

Custom Search Engine

Switch provider by changing SEARCH_PROVIDER in .env.


Configuration Reference

Setting

Default

Description

SEARCH_PROVIDER

tavily

Active search backend

MAX_RESULTS

10

Default result count

REQUEST_TIMEOUT

30

HTTP timeout (seconds)

CONCURRENCY_LIMIT

5

Parallel page extractions

CACHE_TTL

300

Cache time-to-live (seconds, 0 = disabled)

CACHE_MAX_SIZE

256

Max cache entries

PLAYWRIGHT_HEADLESS

true

Headless browser mode

PLAYWRIGHT_TIMEOUT

30000

Browser nav timeout (ms)

RETRY_ATTEMPTS

3

Max HTTP retry attempts

LOG_LEVEL

INFO

Logging verbosity

LOG_FORMAT

json

json or text


Connecting with MCP Clients

Claude Desktop

Add to claude_desktop_config.json:

{
  "mcpServers": {
    "web-search": {
      "command": "python",
      "args": ["C:/path/to/websearchMcp/server.py"],
      "env": {
        "TAVILY_API_KEY": "your_key_here"
      }
    }
  }
}

Custom MCP Client (Python)

from mcp import ClientSession, StdioServerParameters
from mcp.client.stdio import stdio_client

server_params = StdioServerParameters(
    command="python",
    args=["server.py"],
)

async with stdio_client(server_params) as (read, write):
    async with ClientSession(read, write) as session:
        await session.initialize()
        result = await session.call_tool(
            "search_and_extract",
            {"query": "Latest AI trends", "max_results": 3}
        )
        print(result)

Architecture

User Query
    │
    ▼
FastMCP Server (server.py)
    │ validates input (Pydantic)
    ▼
Tool Orchestrator (tools.py)
    │ checks cache → calls provider
    ▼
Search Provider (search.py)
    │ Tavily / Brave / Bing / SerpAPI / Google
    ▼
Raw Search Results
    │
    ▼
Content Extractor (extractor.py)
    │ HTTP + BS4 → Playwright fallback
    ▼
Cleaned & Ranked Results
    │
    ▼
Structured JSON Response

Error Handling

All tools return structured error JSON on failure:

{
  "error": "No API key configured for provider 'tavily'",
  "error_type": "RuntimeError",
  "tool": "web_search",
  "query": "AI trends",
  "timestamp": "2025-06-30T18:00:00Z"
}

Performance Tips

  • Use web_search when you only need snippets (faster, uses less quota).

  • Use search_and_extract for deep research requiring full article content.

  • Increase CONCURRENCY_LIMIT for faster parallel extraction (be mindful of rate limits).

  • Increase CACHE_TTL to reduce repeated API calls for the same queries.

  • Set PLAYWRIGHT_HEADLESS=true (default) in production.


Deploying to Render

  1. Push the repo to GitHub (.env is git-ignored — API keys are safe)

  2. Go to render.com → New → Blueprint → connect repo

  3. Render detects render.yaml automatically

  4. Set TAVILY_API_KEY in Render dashboard → Environment Variables

  5. Your SSE endpoint: https://your-app.onrender.com/sse


Deploying to Azure Container Apps

Prerequisites:

One-command deploy:

# 1. Login to Azure
az login

# 2. Run the deployment script (reads TAVILY_API_KEY from .env automatically)
.\deploy-azure.ps1

The script will:

  • Create a Resource Group + Azure Container Registry

  • Build and push the Docker image via ACR Tasks (builds in Azure cloud — no local build needed)

  • Create a Container Apps Environment

  • Deploy the MCP server with your Tavily key stored as a secret (never in plain text)

  • Print your live SSE endpoint URL

Custom options:

.\deploy-azure.ps1 `
    -ResourceGroup "my-rg" `
    -Location "westeurope" `
    -AppName "my-mcp-server" `
    -Cpu "2.0" `
    -Memory "4.0Gi"

Add to your no-code platform after deploy:

Field

Value

Transport

Server-Sent Events (SSE)

URL

https://<your-app>.<region>.azurecontainerapps.io/sse

Health check: https://<your-app>.<region>.azurecontainerapps.io/health

Update after code changes:

# Just re-run the deploy script — it rebuilds and redeploys
.\deploy-azure.ps1

License

MIT

F
license - not found
-
quality - not tested
C
maintenance

Maintenance

Maintainers
Response time
Release cycle
Releases (12mo)
Commit activity

Resources

Unclaimed servers have limited discoverability.

Looking for Admin?

If you are the server author, to access and configure the admin panel.

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/lalit9168/websearchMcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server