Schema | MCP Web Scrape

Server Configuration

Describes the environment variables required to run the server.

Name	Required	Description	Default
`MCP_WEB_SCRAPE_CACHE_DIR`	No	Directory for caching scraped content.	./cache
`MCP_WEB_SCRAPE_RATE_LIMIT`	No	Rate limiting threshold to prevent server overload.	1000
`MCP_WEB_SCRAPE_USER_AGENT`	No	Custom User-Agent string for identifying the scraper.	MyBot/1.0

Capabilities

Features and capabilities supported by this server

Capability	Details
`tools`	{}
`resources`	{}

Tools

Functions exposed to the LLM to take actions

Name	Description
extract_content	Extract and clean content from a web page, returning Markdown with citation
summarize_content	Generate a summary of already extracted content
clear_cache	Clear cached content entries
get_page_metadata	Extract meta tags, title, description, keywords from web pages
check_url_status	Check if URL is accessible and get HTTP status codes
extract_links	Extract all links from a web page with filtering options
extract_images	Extract all images from a web page with metadata
search_content	Search for specific text patterns within extracted content
get_cache_stats	Get detailed cache statistics and usage information
validate_robots	Check robots.txt compliance for specific URLs
extract_structured_data	Extract JSON-LD, microdata, and schema.org data
compare_content	Compare content between two URLs or cached versions
batch_extract	Extract content from multiple URLs in a single operation
extract_forms	Extract form elements and their structure from web pages
extract_tables	Extract and parse HTML tables with optional CSV export
extract_social_media	Extract social media links and metadata from web pages
extract_contact_info	Extract contact information like emails, phones, addresses from web pages
extract_headings	Extract document structure and heading hierarchy from web pages
extract_feeds	Discover and parse RSS/Atom feeds from web pages
monitor_changes	Monitor web page content changes over time
analyze_performance	Analyze web page performance metrics
generate_sitemap	Generate sitemap by crawling website pages
validate_html	Validate HTML structure, accessibility, and SEO
convert_to_pdf	Convert web page content to PDF format
extract_text_only	Extract plain text content without any formatting or HTML
generate_word_cloud	Generate word frequency analysis and word cloud data from web content
translate_content	Translate web page content to different languages
extract_keywords	Extract important keywords and phrases from web content
analyze_readability	Analyze text readability using various metrics
detect_language	Detect the primary language of web page content
extract_entities	Extract named entities (people, places, organizations) from web content
sentiment_analysis	Analyze sentiment and emotional tone of web content
classify_content	Classify web content into categories and topics
analyze_competitors	Analyze competitor websites for SEO and content insights
extract_schema_markup	Extract and validate schema.org structured data markup
check_broken_links	Check for broken links and redirects on web pages
analyze_page_speed	Analyze page loading speed and performance metrics
generate_meta_tags	Generate optimized meta tags for SEO based on content analysis
scan_vulnerabilities	Scan web pages for common security vulnerabilities
check_ssl_certificate	Check SSL certificate validity and security details
analyze_cookies	Analyze cookies set by web pages for privacy and security
detect_tracking	Detect tracking scripts and privacy-related elements
check_privacy_policy	Analyze privacy policy content and compliance
monitor_uptime	Monitor website uptime and availability
track_changes_detailed	Track detailed changes in web page content with diff analysis
analyze_traffic_patterns	Analyze traffic patterns and user behavior indicators
benchmark_performance	Benchmark website performance against competitors and industry standards
generate_reports	Generate comprehensive reports combining multiple analysis tools

Prompts

Interactive templates invoked by user choice

Name	Description
No prompts

Resources

Contextual data attached and managed by the client

Name	Description
No resources

MCP Web Scrape

Server Configuration

Capabilities

Tools

Prompts

Resources

Latest Blog Posts

MCP directory API