Skip to main content
Glama
mukul975
by mukul975

Server Configuration

Describes the environment variables required to run the server.

NameRequiredDescriptionDefault
MCP_WEB_SCRAPE_CACHE_DIRNoDirectory for caching scraped content../cache
MCP_WEB_SCRAPE_RATE_LIMITNoRate limiting threshold to prevent server overload.1000
MCP_WEB_SCRAPE_USER_AGENTNoCustom User-Agent string for identifying the scraper.MyBot/1.0

Capabilities

Features and capabilities supported by this server

CapabilityDetails
tools
{}
resources
{}

Tools

Functions exposed to the LLM to take actions

NameDescription
extract_content

Extract and clean content from a web page, returning Markdown with citation

summarize_content

Generate a summary of already extracted content

clear_cache

Clear cached content entries

get_page_metadata

Extract meta tags, title, description, keywords from web pages

check_url_status

Check if URL is accessible and get HTTP status codes

extract_links

Extract all links from a web page with filtering options

extract_images

Extract all images from a web page with metadata

search_content

Search for specific text patterns within extracted content

get_cache_stats

Get detailed cache statistics and usage information

validate_robots

Check robots.txt compliance for specific URLs

extract_structured_data

Extract JSON-LD, microdata, and schema.org data

compare_content

Compare content between two URLs or cached versions

batch_extract

Extract content from multiple URLs in a single operation

extract_forms

Extract form elements and their structure from web pages

extract_tables

Extract and parse HTML tables with optional CSV export

extract_social_media

Extract social media links and metadata from web pages

extract_contact_info

Extract contact information like emails, phones, addresses from web pages

extract_headings

Extract document structure and heading hierarchy from web pages

extract_feeds

Discover and parse RSS/Atom feeds from web pages

monitor_changes

Monitor web page content changes over time

analyze_performance

Analyze web page performance metrics

generate_sitemap

Generate sitemap by crawling website pages

validate_html

Validate HTML structure, accessibility, and SEO

convert_to_pdf

Convert web page content to PDF format

extract_text_only

Extract plain text content without any formatting or HTML

generate_word_cloud

Generate word frequency analysis and word cloud data from web content

translate_content

Translate web page content to different languages

extract_keywords

Extract important keywords and phrases from web content

analyze_readability

Analyze text readability using various metrics

detect_language

Detect the primary language of web page content

extract_entities

Extract named entities (people, places, organizations) from web content

sentiment_analysis

Analyze sentiment and emotional tone of web content

classify_content

Classify web content into categories and topics

analyze_competitors

Analyze competitor websites for SEO and content insights

extract_schema_markup

Extract and validate schema.org structured data markup

check_broken_links

Check for broken links and redirects on web pages

analyze_page_speed

Analyze page loading speed and performance metrics

generate_meta_tags

Generate optimized meta tags for SEO based on content analysis

scan_vulnerabilities

Scan web pages for common security vulnerabilities

check_ssl_certificate

Check SSL certificate validity and security details

analyze_cookies

Analyze cookies set by web pages for privacy and security

detect_tracking

Detect tracking scripts and privacy-related elements

check_privacy_policy

Analyze privacy policy content and compliance

monitor_uptime

Monitor website uptime and availability

track_changes_detailed

Track detailed changes in web page content with diff analysis

analyze_traffic_patterns

Analyze traffic patterns and user behavior indicators

benchmark_performance

Benchmark website performance against competitors and industry standards

generate_reports

Generate comprehensive reports combining multiple analysis tools

Prompts

Interactive templates invoked by user choice

NameDescription

No prompts

Resources

Contextual data attached and managed by the client

NameDescription

No resources

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/mukul975/mcp-web-scrape'

If you have feedback or need assistance with the MCP directory API, please join our Discord server