Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@MCP Web Scrapescrape the main text from https://news.ycombinator.com and format it as markdown"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
π·οΈ MCP Web Scrape
Clean, cached web content for agentsβMarkdown + citations, robots-aware, ETag/304 caching.
π¦ Version
Current Version: 1.0.7
π¬ Live Demos
See MCP Web Scrape in action! These demos show real-time extraction and processing:
π Quick Start Demo
π― Tool Examples
β‘ Quick Start
ChatGPT Desktop Setup
Add to your ~/Library/Application Support/ChatGPT/config.json:
Claude Desktop Setup
Add to your ~/Library/Application Support/Claude/claude_desktop_config.json:
π οΈ Available Tools
Core Extraction Tools
Tool | Description |
| Convert HTML to clean Markdown with citations |
| AI-powered content summarization |
| Extract title, description, author, keywords |
| Get all links with filtering options |
| Extract images with alt text and dimensions |
| Search within page content |
| Verify URL accessibility |
| Check robots.txt compliance |
| Parse JSON-LD, microdata, RDFa |
| Compare two pages for changes |
| Process multiple URLs efficiently |
| View cache performance metrics |
| Manage cached content |
Advanced Extraction Tools
Tool | Description |
| Extract form elements, fields, and validation rules |
| Parse HTML tables with headers and structured data |
| Find social media links and profiles |
| Discover emails, phone numbers, and addresses |
| Analyze heading structure (H1-H6) for content hierarchy |
| Discover and parse RSS/Atom feeds |
Content Transformation Tools
Tool | Description |
| Convert web pages to PDF format with customizable settings |
| Extract plain text content without formatting or HTML |
| Generate word frequency analysis and word cloud data |
| Translate web page content to different languages |
| Extract important keywords and phrases from content |
Advanced Analysis Tools
Tool | Description |
| Analyze text readability using various metrics (Flesch, Gunning-Fog, etc.) |
| Detect the primary language of web page content |
| Extract named entities (people, places, organizations) |
| Analyze sentiment and emotional tone of content |
| Classify content into categories and topics |
SEO & Marketing Tools
Tool | Description |
| Analyze competitor websites for SEO and content insights |
| Extract and validate schema.org structured data |
| Check for broken links and redirects on pages |
| Analyze page loading speed and performance metrics |
| Generate optimized meta tags for SEO |
Security & Privacy Tools
Tool | Description |
| Scan pages for common security vulnerabilities |
| Check SSL certificate validity and security details |
| Analyze cookies and tracking mechanisms |
| Detect tracking scripts and privacy concerns |
| Analyze privacy policy compliance and coverage |
Advanced Monitoring Tools
Tool | Description |
| Monitor website uptime and availability |
| Advanced change tracking with similarity analysis |
| Analyze website traffic patterns and trends |
| Benchmark performance against competitors |
| Generate comprehensive analysis reports |
Analysis & Monitoring Tools
Tool | Description |
| Track content changes over time with similarity analysis |
| Measure page performance, SEO, and accessibility metrics |
| Crawl websites to generate comprehensive sitemaps |
| Validate HTML structure, accessibility, and SEO compliance |
π€ Why Not Just Use Built-in Browsing?
Deterministic Results β Same URL always returns identical content
Smart Citations β Every fact links back to its source
Robots Compliant β Respects robots.txt and rate limits
Lightning Fast β ETag/304 caching + persistent storage
Agent-Optimized β Clean Markdown instead of messy HTML
π Safety First
β Respects robots.txt by default
β Rate limiting prevents server overload
β No paywall bypass - ethical scraping only
β User-Agent identification for transparency
π¦ Installation
π§ Configuration
π Transports
STDIO (default)
HTTP/SSE
π Resources
Access cached content as MCP resources:
π€ Contributing
We love contributions! See CONTRIBUTING.md for guidelines.
Good First Issues:
Add new content extractors
Improve error handling
Write more tests
Enhance documentation
π License
MIT Β© Mahipal
π Star History
Built with β€οΈ for the