The Scrapy MCP Server is a robust, enterprise-grade web scraping platform that offers comprehensive data extraction capabilities for commercial use.
Core Scraping Capabilities:
Multiple scraping methods: HTTP requests, Scrapy framework, Selenium, or Playwright with intelligent method selection
Concurrent processing: Scrape multiple URLs simultaneously with exponential backoff retry mechanisms
JavaScript support: Fully render dynamic, JavaScript-heavy websites using complete browser rendering
Advanced data extraction: Configure flexible extraction rules using simple or advanced selectors, or automatically extract structured data like contact information, social media links, product details, and addresses
Link extraction: Specialized link extraction with domain filtering and internal/external link options
Form interaction: Automatically fill and submit various form types including text inputs, checkboxes, and file uploads
Anti-Detection & Performance:
Stealth techniques: Bypass anti-bot measures using undetected-chromedriver, Playwright stealth, random User-Agent rotation, and proxy support
Performance optimization: In-memory caching, rate limiting, and intelligent request handling to prevent server overload
Monitoring tools: Track server metrics including request counts, success rates, cache statistics, and detailed performance monitoring
Enterprise Features:
Ethical compliance: Check robots.txt files for responsible data collection
Error handling: Robust error classification and handling mechanisms
Cache management: Clear scraping results cache and manage server resources
Provides web scraping capabilities using the Scrapy framework for large-scale data extraction, with support for concurrent requests, custom pipelines, and advanced crawling features.
Enables browser automation and JavaScript-heavy website scraping through Selenium WebDriver, with support for form filling, element waiting, and dynamic content extraction.
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@Scrapy MCP Serverscrape the pricing page from example.com and convert it to markdown"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
Data Extractor is a commercial-grade MCP Server built on FastMCP, offering robust capabilities to read, extract, and localize (into Markdown) content from web pages and PDFs with both text and images. It is purpose-built for long-term deployment in enterprise environments.
🛠️ MCP Server Core Tools (14)
Web Page
工具名称 | 功能描述 | 主要参数 |
scrape_webpage | 单页面抓取 |
|
scrape_multiple_webpages | 批量页面抓取 |
|
scrape_with_stealth | 反检测抓取 |
|
fill_and_submit_form | 表单自动化 |
|
extract_links | 专业链接提取 |
|
extract_structured_data | 结构化数据提取 |
|
get_page_info | 页面信息获取 |
|
check_robots_txt | 爬虫规则检查 |
|
convert_webpage_to_markdown | 页面转 Markdown |
|
batch_convert_webpages_to_markdown | 批量 Markdown 转换 |
|
PDF Document
工具名称 | 功能描述 | 主要参数 |
convert_pdf_to_markdown | PDF 转 Markdown |
|
batch_convert_pdfs_to_markdown | 批量 PDF 转换 |
|
Service Management
工具名称 | 功能描述 | 主要参数 |
get_server_metrics | 性能指标监控 | 无参数 - 返回请求统计、性能指标、缓存情况 |
clear_cache | 缓存管理 | 无参数 - 清空所有缓存数据 |
Related MCP server: Scrapezy
🎯 Quick Navigation
🤝 Contribution
欢迎提交 Issue 和 Pull Request 来改进这个项目。
📄 License
MIT License - 详见 LICENSE 文件
注意: 请负责任地使用此工具,遵守网站的使用条款和 robots.txt 规则,尊重网站的知识产权。