Skip to main content
Glama
Aniruddha1202

MCP Web Scraper Server

πŸš€ Production MCP Web Scraper Server

A modular, production-ready MCP server built with the official MCP Python SDK. Optimized for Render deployment with clean separation of concerns.

πŸ“ Project Structure

mcp-web-scraper/
β”œβ”€β”€ server.py              # Main server entry point
β”œβ”€β”€ tools/
β”‚   β”œβ”€β”€ __init__.py       # Tools package initialization
β”‚   β”œβ”€β”€ search.py         # Search tools (web_search, news_search, etc.)
β”‚   └── scraping.py       # Scraping tools (scrape_html, extract_article, etc.)
β”œβ”€β”€ utils/
β”‚   β”œβ”€β”€ __init__.py       # Utils package initialization
β”‚   └── helpers.py        # Helper functions (clean_text, validate_url)
β”œβ”€β”€ requirements.txt       # Python dependencies
β”œβ”€β”€ render.yaml           # Render deployment configuration
β”œβ”€β”€ .gitignore            # Git ignore rules
β”œβ”€β”€ README.md             # This file
└── config.example.json   # Claude Desktop config example

✨ Features

πŸ” Search Tools (tools/search.py)

  • web_search - DuckDuckGo web search

  • news_search - News articles with metadata

  • search_and_scrape - Search + content extraction

  • smart_search - Adaptive search (quick/standard/comprehensive)

πŸ“„ Scraping Tools (tools/scraping.py)

  • scrape_html - HTML scraping with CSS selectors

  • extract_article - Clean article extraction

  • extract_links - Link extraction with filtering

  • extract_metadata - Page metadata & Open Graph

  • scrape_table - Table data extraction

πŸš€ Quick Deploy to Render

Step 1: Create Project Structure

mkdir mcp-web-scraper
cd mcp-web-scraper

# Create directory structure
mkdir -p tools utils

# Create all files (copy from artifacts above):
# - server.py
# - tools/__init__.py
# - tools/search.py
# - tools/scraping.py
# - utils/__init__.py
# - utils/helpers.py
# - requirements.txt
# - render.yaml
# - .gitignore
# - README.md

Step 2: Push to GitHub

git init
git add .
git commit -m "Initial commit: Modular MCP Web Scraper"
git remote add origin https://github.com/YOUR_USERNAME/mcp-web-scraper.git
git push -u origin main

Step 3: Deploy on Render

  1. Go to render.com

  2. Click "New +" β†’ "Web Service"

  3. Connect your GitHub repository

  4. Render auto-detects render.yaml

  5. Click "Create Web Service"

  6. Wait 2-3 minutes ✨

Step 4: Get Your URL

Your service: https://your-app.onrender.com MCP endpoint: https://your-app.onrender.com/mcp

πŸ”Œ Connect to Claude Desktop

Config Location

  • macOS: ~/Library/Application Support/Claude/claude_desktop_config.json

  • Windows: %APPDATA%\Claude\claude_desktop_config.json

Configuration

{
  "mcpServers": {
    "web-scraper": {
      "type": "streamable-http",
      "url": "https://your-app.onrender.com/mcp"
    }
  }
}

Restart Claude Desktop after updating config!

πŸ’» Local Development

# Clone and setup
git clone https://github.com/YOUR_USERNAME/mcp-web-scraper.git
cd mcp-web-scraper

# Create virtual environment
python3 -m venv venv
source venv/bin/activate  # Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

# Run server
python server.py

Server runs at http://localhost:8000/mcp

Test Locally

# List tools
curl -X POST http://localhost:8000/mcp \
  -H "Content-Type: application/json" \
  -d '{"jsonrpc":"2.0","id":1,"method":"tools/list"}'

# Test web search
curl -X POST http://localhost:8000/mcp \
  -H "Content-Type: application/json" \
  -d '{
    "jsonrpc":"2.0",
    "id":2,
    "method":"tools/call",
    "params":{
      "name":"web_search",
      "arguments":{"query":"AI news","max_results":3}
    }
  }'

πŸ› οΈ Adding New Tools

1. Search Tool Example

Edit tools/search.py:

@mcp.tool()
def my_custom_search(query: str) -> dict:
    """Your custom search tool"""
    # Implementation here
    return {"success": True, "data": []}

2. Scraping Tool Example

Edit tools/scraping.py:

@mcp.tool()
def my_custom_scraper(url: str) -> dict:
    """Your custom scraper"""
    # Implementation here
    return {"success": True, "content": ""}

3. Deploy Changes

git add .
git commit -m "Add new tools"
git push origin main
# Render auto-deploys!

πŸ“Š Monitoring

View Logs

  1. Render Dashboard β†’ Your Service

  2. Click "Logs" tab

  3. View real-time logs

Health Check

curl https://your-app.onrender.com/health

🎯 Architecture Benefits

βœ… Modular Design

  • Separation of concerns - Each file has one responsibility

  • Easy to maintain - Find and update code quickly

  • Scalable - Add new tools without touching existing code

βœ… Clean Code

  • Type hints - Better IDE support and error catching

  • Logging - Track all operations

  • Error handling - Graceful failures with detailed errors

βœ… Production Ready

  • Official MCP SDK - FastMCP framework

  • Streamable HTTP - Single endpoint communication

  • Stateless - Horizontally scalable

  • Health checks - Automatic monitoring

πŸ’¬ Example Usage in Claude

  • "Search for latest quantum computing news"

  • "Extract the article from https://example.com/post"

  • "Find and scrape top 5 articles about AI safety"

  • "Get all links from https://news.ycombinator.com"

  • "Do comprehensive research on renewable energy"

πŸ› Troubleshooting

Import Errors

# Ensure you're in project root
cd mcp-web-scraper

# Check Python path
export PYTHONPATH="${PYTHONPATH}:$(pwd)"

# Run server
python server.py

Tools Not Registered

Check logs for "Registering X tools..." messages

Module Not Found

Ensure all __init__.py files exist in:

  • tools/__init__.py

  • utils/__init__.py

πŸ“š Resources

πŸ“„ License

MIT License - Free to use and modify!


Modular βœ… | Production-Ready βœ… | Easy to Extend βœ…

-
security - not tested
F
license - not found
-
quality - not tested

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/Aniruddha1202/mcp-web-scraper'

If you have feedback or need assistance with the MCP directory API, please join our Discord server