Skip to main content
Glama
Aniruddha1202

MCP Web Scraper Server

πŸš€ Production MCP Web Scraper Server

A modular, production-ready MCP server built with the official MCP Python SDK. Optimized for Render deployment with clean separation of concerns.

πŸ“ Project Structure

mcp-web-scraper/ β”œβ”€β”€ server.py # Main server entry point β”œβ”€β”€ tools/ β”‚ β”œβ”€β”€ __init__.py # Tools package initialization β”‚ β”œβ”€β”€ search.py # Search tools (web_search, news_search, etc.) β”‚ └── scraping.py # Scraping tools (scrape_html, extract_article, etc.) β”œβ”€β”€ utils/ β”‚ β”œβ”€β”€ __init__.py # Utils package initialization β”‚ └── helpers.py # Helper functions (clean_text, validate_url) β”œβ”€β”€ requirements.txt # Python dependencies β”œβ”€β”€ render.yaml # Render deployment configuration β”œβ”€β”€ .gitignore # Git ignore rules β”œβ”€β”€ README.md # This file └── config.example.json # Claude Desktop config example

✨ Features

πŸ” Search Tools (tools/search.py)

  • web_search - DuckDuckGo web search

  • news_search - News articles with metadata

  • search_and_scrape - Search + content extraction

  • smart_search - Adaptive search (quick/standard/comprehensive)

πŸ“„ Scraping Tools (tools/scraping.py)

  • scrape_html - HTML scraping with CSS selectors

  • extract_article - Clean article extraction

  • extract_links - Link extraction with filtering

  • extract_metadata - Page metadata & Open Graph

  • scrape_table - Table data extraction

πŸš€ Quick Deploy to Render

Step 1: Create Project Structure

mkdir mcp-web-scraper cd mcp-web-scraper # Create directory structure mkdir -p tools utils # Create all files (copy from artifacts above): # - server.py # - tools/__init__.py # - tools/search.py # - tools/scraping.py # - utils/__init__.py # - utils/helpers.py # - requirements.txt # - render.yaml # - .gitignore # - README.md

Step 2: Push to GitHub

git init git add . git commit -m "Initial commit: Modular MCP Web Scraper" git remote add origin https://github.com/YOUR_USERNAME/mcp-web-scraper.git git push -u origin main

Step 3: Deploy on Render

  1. Go to render.com

  2. Click "New +" β†’ "Web Service"

  3. Connect your GitHub repository

  4. Render auto-detects render.yaml

  5. Click "Create Web Service"

  6. Wait 2-3 minutes ✨

Step 4: Get Your URL

Your service: https://your-app.onrender.com MCP endpoint: https://your-app.onrender.com/mcp

πŸ”Œ Connect to Claude Desktop

Config Location

  • macOS: ~/Library/Application Support/Claude/claude_desktop_config.json

  • Windows: %APPDATA%\Claude\claude_desktop_config.json

Configuration

{ "mcpServers": { "web-scraper": { "type": "streamable-http", "url": "https://your-app.onrender.com/mcp" } } }

Restart Claude Desktop after updating config!

πŸ’» Local Development

# Clone and setup git clone https://github.com/YOUR_USERNAME/mcp-web-scraper.git cd mcp-web-scraper # Create virtual environment python3 -m venv venv source venv/bin/activate # Windows: venv\Scripts\activate # Install dependencies pip install -r requirements.txt # Run server python server.py

Server runs at http://localhost:8000/mcp

Test Locally

# List tools curl -X POST http://localhost:8000/mcp \ -H "Content-Type: application/json" \ -d '{"jsonrpc":"2.0","id":1,"method":"tools/list"}' # Test web search curl -X POST http://localhost:8000/mcp \ -H "Content-Type: application/json" \ -d '{ "jsonrpc":"2.0", "id":2, "method":"tools/call", "params":{ "name":"web_search", "arguments":{"query":"AI news","max_results":3} } }'

πŸ› οΈ Adding New Tools

1. Search Tool Example

Edit tools/search.py:

@mcp.tool() def my_custom_search(query: str) -> dict: """Your custom search tool""" # Implementation here return {"success": True, "data": []}

2. Scraping Tool Example

Edit tools/scraping.py:

@mcp.tool() def my_custom_scraper(url: str) -> dict: """Your custom scraper""" # Implementation here return {"success": True, "content": ""}

3. Deploy Changes

git add . git commit -m "Add new tools" git push origin main # Render auto-deploys!

πŸ“Š Monitoring

View Logs

  1. Render Dashboard β†’ Your Service

  2. Click "Logs" tab

  3. View real-time logs

Health Check

curl https://your-app.onrender.com/health

🎯 Architecture Benefits

βœ… Modular Design

  • Separation of concerns - Each file has one responsibility

  • Easy to maintain - Find and update code quickly

  • Scalable - Add new tools without touching existing code

βœ… Clean Code

  • Type hints - Better IDE support and error catching

  • Logging - Track all operations

  • Error handling - Graceful failures with detailed errors

βœ… Production Ready

  • Official MCP SDK - FastMCP framework

  • Streamable HTTP - Single endpoint communication

  • Stateless - Horizontally scalable

  • Health checks - Automatic monitoring

πŸ’¬ Example Usage in Claude

  • "Search for latest quantum computing news"

  • "Extract the article from https://example.com/post"

  • "Find and scrape top 5 articles about AI safety"

  • "Get all links from https://news.ycombinator.com"

  • "Do comprehensive research on renewable energy"

πŸ› Troubleshooting

Import Errors

# Ensure you're in project root cd mcp-web-scraper # Check Python path export PYTHONPATH="${PYTHONPATH}:$(pwd)" # Run server python server.py

Tools Not Registered

Check logs for "Registering X tools..." messages

Module Not Found

Ensure all __init__.py files exist in:

  • tools/__init__.py

  • utils/__init__.py

πŸ“š Resources

πŸ“„ License

MIT License - Free to use and modify!


Modular βœ… | Production-Ready βœ… | Easy to Extend βœ…

-
security - not tested
F
license - not found
-
quality - not tested

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/Aniruddha1202/mcp-web-scraper'

If you have feedback or need assistance with the MCP directory API, please join our Discord server