MCP Web Scrape
Server Configuration
Describes the environment variables required to run the server.
| Name | Required | Description | Default |
|---|---|---|---|
| MCP_WEB_SCRAPE_CACHE_DIR | No | Directory for caching scraped content. | ./cache |
| MCP_WEB_SCRAPE_RATE_LIMIT | No | Rate limiting threshold to prevent server overload. | 1000 |
| MCP_WEB_SCRAPE_USER_AGENT | No | Custom User-Agent string for identifying the scraper. | MyBot/1.0 |
Capabilities
Features and capabilities supported by this server
| Capability | Details |
|---|---|
| tools | {} |
| resources | {} |
Tools
Functions exposed to the LLM to take actions
| Name | Description |
|---|---|
| extract_contentB | Extract and clean content from a web page, returning Markdown with citation |
| summarize_contentC | Generate a summary of already extracted content |
| clear_cacheC | Clear cached content entries |
| get_page_metadataB | Extract meta tags, title, description, keywords from web pages |
| check_url_statusC | Check if URL is accessible and get HTTP status codes |
| extract_linksC | Extract all links from a web page with filtering options |
| extract_imagesC | Extract all images from a web page with metadata |
| search_contentB | Search for specific text patterns within extracted content |
| get_cache_statsC | Get detailed cache statistics and usage information |
| validate_robotsC | Check robots.txt compliance for specific URLs |
| extract_structured_dataC | Extract JSON-LD, microdata, and schema.org data |
| compare_contentC | Compare content between two URLs or cached versions |
| batch_extractC | Extract content from multiple URLs in a single operation |
| extract_formsC | Extract form elements and their structure from web pages |
| extract_tablesC | Extract and parse HTML tables with optional CSV export |
| extract_social_mediaC | Extract social media links and metadata from web pages |
| extract_contact_infoC | Extract contact information like emails, phones, addresses from web pages |
| extract_headingsB | Extract document structure and heading hierarchy from web pages |
| extract_feedsB | Discover and parse RSS/Atom feeds from web pages |
| monitor_changesC | Monitor web page content changes over time |
| analyze_performanceC | Analyze web page performance metrics |
| generate_sitemapC | Generate sitemap by crawling website pages |
| validate_htmlC | Validate HTML structure, accessibility, and SEO |
| convert_to_pdfB | Convert web page content to PDF format |
| extract_text_onlyB | Extract plain text content without any formatting or HTML |
| generate_word_cloudB | Generate word frequency analysis and word cloud data from web content |
| translate_contentB | Translate web page content to different languages |
| extract_keywordsB | Extract important keywords and phrases from web content |
| analyze_readabilityC | Analyze text readability using various metrics |
| detect_languageB | Detect the primary language of web page content |
| extract_entitiesB | Extract named entities (people, places, organizations) from web content |
| sentiment_analysisC | Analyze sentiment and emotional tone of web content |
| classify_contentC | Classify web content into categories and topics |
| analyze_competitorsC | Analyze competitor websites for SEO and content insights |
| extract_schema_markupC | Extract and validate schema.org structured data markup |
| check_broken_linksC | Check for broken links and redirects on web pages |
| analyze_page_speedC | Analyze page loading speed and performance metrics |
| generate_meta_tagsC | Generate optimized meta tags for SEO based on content analysis |
| scan_vulnerabilitiesC | Scan web pages for common security vulnerabilities |
| check_ssl_certificateC | Check SSL certificate validity and security details |
| analyze_cookiesB | Analyze cookies set by web pages for privacy and security |
| detect_trackingC | Detect tracking scripts and privacy-related elements |
| check_privacy_policyC | Analyze privacy policy content and compliance |
| monitor_uptimeC | Monitor website uptime and availability |
| track_changes_detailedC | Track detailed changes in web page content with diff analysis |
| analyze_traffic_patternsC | Analyze traffic patterns and user behavior indicators |
| benchmark_performanceC | Benchmark website performance against competitors and industry standards |
| generate_reportsC | Generate comprehensive reports combining multiple analysis tools |
Prompts
Interactive templates invoked by user choice
| Name | Description |
|---|---|
No prompts | |
Resources
Contextual data attached and managed by the client
| Name | Description |
|---|---|
No resources | |
Latest Blog Posts
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/mukul975/mcp-web-scrape'
If you have feedback or need assistance with the MCP directory API, please join our Discord server