Enables web searching via DuckDuckGo, including standard, intelligent, and news-specific searches, as well as tools to search and automatically scrape content from results.
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@MCP Web Scraper ServerSearch for the latest news on artificial intelligence"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
π MCP Web Scraper Server A production-ready MCP (Model Context Protocol) server for advanced web scraping and search, easily deployable on Railway.
β¨ Features π Advanced Web Search - Search anything on the web using DuckDuckGo π€ Smart Search - Intelligent search with quick/standard/comprehensive modes π° News Search - Dedicated news article search with dates and sources π― Search & Scrape - Automatically search and extract full content from results π Article Extraction - Clean article content extraction (removes ads/navigation) π Link Extraction - Extract all links with regex filtering π Table Extraction - Extract table data from webpages π Metadata Extraction - Get page metadata and Open Graph tags π Easy Railway Deployment πͺ Production-ready π οΈ Tools Available π Search Tools web_search - Search the web for anything (just give a query!) smart_search - Intelligent search with modes (quick/standard/comprehensive) search_and_scrape - Search + automatically scrape full content news_search - Search specifically for news articles π Scraping Tools scrape_html - Scrape HTML content with optional CSS selectors extract_links - Extract all links with optional filtering extract_metadata - Get page metadata and Open Graph tags scrape_table - Extract table data from webpages extract_article - Clean article extraction (removes ads/navigation) π Quick Deploy to Railway Step 1: Create GitHub Repository bash
Clone or download this repository
git clone https://github.com/yourusername/mcp-web-scraper.git cd mcp-web-scraper
Or create new repository
mkdir mcp-web-scraper cd mcp-web-scraper
Copy all files here
Initialize git
git init git add . git commit -m "Initial commit: MCP Web Scraper Server" git branch -M main git remote add origin https://github.com/YOUR_USERNAME/mcp-web-scraper.git git push -u origin main Step 2: Deploy to Railway Go to railway.app Click "New Project" Select "Deploy from GitHub repo" Choose your repository Railway automatically detects Dockerfile and deploys! π Step 3: Get Your URL Click on your deployment in Railway Go to "Settings" β "Domains" Click "Generate Domain" Copy your URL (e.g., https://mcp-web-scraper-production.up.railway.app) Step 4: Test Your Server bash
Health check
curl https://your-app.up.railway.app/health
List available tools
curl https://your-app.up.railway.app/tools
Test web search
curl -X POST https://your-app.up.railway.app/call-tool
-H "Content-Type: application/json"
-d '{"name": "web_search", "arguments": {"query": "latest AI news"}}'
π» Local Development
bash
Clone repository
git clone https://github.com/yourusername/mcp-web-scraper.git cd mcp-web-scraper
Create virtual environment
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
Install dependencies
pip install -r requirements.txt
Run server
uvicorn src.server:app --reload --port 8000 Visit http://localhost:8000 to see the server running!
π Connect to Claude Desktop Add to your Claude Desktop config (claude_desktop_config.json):
macOS: ~/Library/Application Support/Claude/claude_desktop_config.json Windows: %APPDATA%\Claude\claude_desktop_config.json
json { "mcpServers": { "web-scraper": { "command": "npx", "args": [ "-y", "mcp-remote", "https://your-app.up.railway.app/sse" ] } } } Then restart Claude Desktop!
π Example Usage
Search the Web
bash
curl -X POST http://localhost:8000/call-tool
-H "Content-Type: application/json"
-d '{
"name": "web_search",
"arguments": {
"query": "best pizza recipe",
"max_results": 5
}
}'
Smart Search (Comprehensive)
bash
curl -X POST http://localhost:8000/call-tool
-H "Content-Type: application/json"
-d '{
"name": "smart_search",
"arguments": {
"query": "climate change solutions",
"mode": "comprehensive"
}
}'
Search and Scrape
bash
curl -X POST http://localhost:8000/call-tool
-H "Content-Type: application/json"
-d '{
"name": "search_and_scrape",
"arguments": {
"query": "machine learning tutorials",
"num_results": 3
}
}'
News Search
bash
curl -X POST http://localhost:8000/call-tool
-H "Content-Type: application/json"
-d '{
"name": "news_search",
"arguments": {
"query": "technology",
"max_results": 10
}
}'
Extract Article
bash
curl -X POST http://localhost:8000/call-tool
-H "Content-Type: application/json"
-d '{
"name": "extract_article",
"arguments": {
"url": "https://example.com/article"
}
}'
π― Use Cases in Claude
Once connected, you can ask Claude:
"Search for the best Italian restaurants in Rome" "Find me recent articles about quantum computing" "What's the latest news on AI developments?" "Research blockchain technology and give me detailed info" "Scrape the table from this webpage: [URL]" "Extract all links from example.com" π Project Structure mcp-web-scraper/ βββ src/ β βββ init.py # Package initialization β βββ server.py # FastAPI server and MCP integration β βββ tools.py # Web scraping and search tools βββ requirements.txt # Python dependencies βββ Dockerfile # Docker configuration βββ railway.json # Railway deployment config βββ .gitignore # Git ignore file βββ README.md # This file π§ Configuration Environment Variables (Optional) You can set these in Railway dashboard under "Variables":
LOG_LEVEL - Logging level (default: INFO) PORT - Server port (default: 8000) HOST - Server host (default: 0.0.0.0) π Monitoring Railway provides built-in monitoring:
Metrics - CPU, Memory, Network usage Logs - Real-time application logs Deployments - Deployment history and rollbacks Access these in your Railway dashboard.
π° Cost Railway Free Tier:
$5 free credit per month 500 hours of usage Perfect for personal use and testing For production use, consider upgrading to Railway Pro.
π Security Notes β οΈ This server is deployed without authentication for easy use. For production:
Consider adding API key authentication Implement rate limiting Restrict allowed domains Use environment variables for sensitive data π Troubleshooting Server not starting? Check Railway logs in dashboard Verify all files are committed to Git Ensure Dockerfile is in root directory Tools not working? Check tool names match exactly Verify JSON format in requests Check server logs for errors Can't connect to Claude? Verify Railway URL is correct Ensure /sse endpoint is accessible Restart Claude Desktop after config change π€ Contributing Contributions are welcome! Feel free to:
Report bugs Suggest new features Submit pull requests π License MIT License - feel free to use and modify!
π Acknowledgments Built with:
FastAPI - Web framework MCP - Model Context Protocol DuckDuckGo Search - Web search Trafilatura - Content extraction BeautifulSoup - HTML parsing Railway - Deployment platform π Support GitHub Issues: Report a bug Railway Docs: docs.railway.app MCP Docs: modelcontextprotocol.io Made with β€οΈ for the MCP community