Server Configuration
Describes the environment variables required to run the server.
| Name | Required | Description | Default |
|---|---|---|---|
| MCP_WEB_SCRAPE_CACHE_DIR | No | Directory for caching scraped content. | ./cache |
| MCP_WEB_SCRAPE_RATE_LIMIT | No | Rate limiting threshold to prevent server overload. | 1000 |
| MCP_WEB_SCRAPE_USER_AGENT | No | Custom User-Agent string for identifying the scraper. | MyBot/1.0 |
Capabilities
Features and capabilities supported by this server
| Capability | Details |
|---|---|
| tools | {} |
| resources | {} |
Tools
Functions exposed to the LLM to take actions
| Name | Description |
|---|---|
| extract_content | Extract and clean content from a web page, returning Markdown with citation |
| summarize_content | Generate a summary of already extracted content |
| clear_cache | Clear cached content entries |
| get_page_metadata | Extract meta tags, title, description, keywords from web pages |
| check_url_status | Check if URL is accessible and get HTTP status codes |
| extract_links | Extract all links from a web page with filtering options |
| extract_images | Extract all images from a web page with metadata |
| search_content | Search for specific text patterns within extracted content |
| get_cache_stats | Get detailed cache statistics and usage information |
| validate_robots | Check robots.txt compliance for specific URLs |
| extract_structured_data | Extract JSON-LD, microdata, and schema.org data |
| compare_content | Compare content between two URLs or cached versions |
| batch_extract | Extract content from multiple URLs in a single operation |
| extract_forms | Extract form elements and their structure from web pages |
| extract_tables | Extract and parse HTML tables with optional CSV export |
| extract_social_media | Extract social media links and metadata from web pages |
| extract_contact_info | Extract contact information like emails, phones, addresses from web pages |
| extract_headings | Extract document structure and heading hierarchy from web pages |
| extract_feeds | Discover and parse RSS/Atom feeds from web pages |
| monitor_changes | Monitor web page content changes over time |
| analyze_performance | Analyze web page performance metrics |
| generate_sitemap | Generate sitemap by crawling website pages |
| validate_html | Validate HTML structure, accessibility, and SEO |
| convert_to_pdf | Convert web page content to PDF format |
| extract_text_only | Extract plain text content without any formatting or HTML |
| generate_word_cloud | Generate word frequency analysis and word cloud data from web content |
| translate_content | Translate web page content to different languages |
| extract_keywords | Extract important keywords and phrases from web content |
| analyze_readability | Analyze text readability using various metrics |
| detect_language | Detect the primary language of web page content |
| extract_entities | Extract named entities (people, places, organizations) from web content |
| sentiment_analysis | Analyze sentiment and emotional tone of web content |
| classify_content | Classify web content into categories and topics |
| analyze_competitors | Analyze competitor websites for SEO and content insights |
| extract_schema_markup | Extract and validate schema.org structured data markup |
| check_broken_links | Check for broken links and redirects on web pages |
| analyze_page_speed | Analyze page loading speed and performance metrics |
| generate_meta_tags | Generate optimized meta tags for SEO based on content analysis |
| scan_vulnerabilities | Scan web pages for common security vulnerabilities |
| check_ssl_certificate | Check SSL certificate validity and security details |
| analyze_cookies | Analyze cookies set by web pages for privacy and security |
| detect_tracking | Detect tracking scripts and privacy-related elements |
| check_privacy_policy | Analyze privacy policy content and compliance |
| monitor_uptime | Monitor website uptime and availability |
| track_changes_detailed | Track detailed changes in web page content with diff analysis |
| analyze_traffic_patterns | Analyze traffic patterns and user behavior indicators |
| benchmark_performance | Benchmark website performance against competitors and industry standards |
| generate_reports | Generate comprehensive reports combining multiple analysis tools |
Prompts
Interactive templates invoked by user choice
| Name | Description |
|---|---|
No prompts | |
Resources
Contextual data attached and managed by the client
| Name | Description |
|---|---|
No resources | |