mcp-webscraper
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@mcp-webscraperScrape books.toscrape.com for all book titles and prices"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
MCP Web Scraper for Claude Desktop
A Model Context Protocol (MCP) server that enables Claude Desktop to perform advanced web scraping and crawling operations. Extract structured data, analyze website architectures, and discover content relationships - all through natural conversation with Claude.
🎯 Features
Static & Dynamic Scraping: Handle both regular HTML and JavaScript-rendered pages
Website Crawling: Discover and map entire website structures
Data Extraction: Extract specific elements using CSS selectors
Batch Operations: Process multiple URLs efficiently
Link Analysis: Understand how pages connect and reference each other
🎥 Watch the Tutorial
See the full demo and step-by-step setup guide on YouTube:

📋 Prerequisites
Python 3.10 or higher
WSL2 with Ubuntu (for Windows users)
Claude Desktop application
uvpackage manager
🚀 Installation
1. Clone the Repository
git clone https://github.com/samirsaci/mcp-webscraper.git
cd mcp-webscraper2. Install uv Package Manager
If you don't have uv installed:
curl -LsSf https://astral.sh/uv/install.sh | sh3. Initialize the project
# Initialize the virtual environment
uv init .4. Install Dependencies
uv add "mcp[cli]"
source .venv/bin/activate
uv pip install -r requirements.txtDo not forget to install playwright browser to scrape dynamic content
uv run playwright install chromium5. Test the Installation
Run the test script to verify everything works using a website that loves to be scrapped https://books.toscrape.com/:
uv run python test_local.pyExpected Output:
Static Scraping Success: True
HTML length: 51294
---------
Dynamic Scraping Success: True
HTML length: 51004
---------
Testing Crawler...
Crawler Success: True
Pages crawled: 5
Pages discovered: 437
Failed URLs: 0
First 3 pages discovered:
1. All products | Books to Scrape - Sandbox
URL: https://books.toscrape.com/
Links found: 73
Depth: 0
2. All products | Books to Scrape - Sandbox
URL: https://books.toscrape.com/index.html
Links found: 73
Depth: 1
3. Books |
Books to Scrape - Sandbox
URL: https://books.toscrape.com/catalogue/category/books_1/index.html
Links found: 73
Depth: 1
Statistics:
Total unique links: 104
Max depth reached: 1
Avg load time: 0.21s⚙️ Claude Desktop Configuration
For Windows Users with WSL
Locate your Claude Desktop configuration file:
File -> Settings -> Edit ConfigAdd the WebScrappingServer configuration
{
"mcpServers": {
"WebScrapingServer": {
"command": "wsl",
"args": [
"-d",
"Ubuntu",
"bash",
"-lc",
"cd ~/path/to/mcp-webscraper && uv run --with mcp[cli] mcp run scrapping.py"
]
}
}
}Important: Replace ~/path/to/mcp-webscraper with the actual path to your project folder in WSL. To find your WSL path:
pwd3. Restart Claude Desktop
After updating the configuration:
Completely quit Claude Desktop (not just close the window)
Start Claude Desktop again
Look for the 🔌 icon in the text input area
Click it to verify "WebScrapingServer" appears
🔧 Usage Examples
Once configured, you can ask Claude to:
Basic Scraping
"Scrape the homepage of example.com and tell me what you find"Advanced SEO analysis
Please help me to crawl my personal blog https://yourblog.com with a limit of 150 pages.
I would like to understand how articles are referring to each other.
Can you help me to perform this type of analysis?📁 Project Structure
mcp-webscraper/
├── models/
│ └── scraping_models.py # Pydantic models for data validation
├── utils/
│ └── web_scraper.py # Core WebScraper class
├── scrapping.py # MCP server implementation
├── test_local.py # Local testing script
├── requirements.txt # Python dependencies
├── README.md # This file
└── scraping_server.log # Server logs (created at runtime)🛠️ Available MCP Tools
The server exposes these tools to Claude:
scrape_url: Get raw HTML from any webpageextract_data: Extract multiple elements using CSS selectorsextract_first: Get a single element from a pagebatch_scrape: Process multiple URLscrawl_website: Discover and map website structure
🐛 Troubleshooting
Server not appearing in Claude
*If the server does not appear in Claude, try first to restart Claude Desktop by terminating its processus.`
If this does not work, try to
Check the log file:
cat scraping_server.logVerify the path in config matches your WSL path:
pwdThe output should match what you have in your config file.
Test the server directly:
uv run python scrapping.pyPlaywright issues
If JavaScript scraping fails, try to reinstall the browser
uv run playwright install chromiumWSL-specific issues
Ensure WSL2 is properly installed:
Run this in Windows PowerShell opened as Administrator
wsl --status📄 License
MIT License - feel free to use this in your own projects!
About me 🤓
Senior Supply Chain and Data Science consultant with international experience working on Logistics and Transportation operations. For consulting or advising on analytics and sustainable supply chain transformation, feel free to contact me via Logigreen Consulting or LinkedIn
This server cannot be installed
Resources
Unclaimed servers have limited discoverability.
Looking for Admin?
If you are the server author, to access and configure the admin panel.
Latest Blog Posts
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/samirsaci/mcp-webscraper'
If you have feedback or need assistance with the MCP directory API, please join our Discord server