Skip to main content
Glama
mukul975
by mukul975

Server Configuration

Describes the environment variables required to run the server.

NameRequiredDescriptionDefault
MCP_WEB_SCRAPE_CACHE_DIRNoDirectory for caching scraped content../cache
MCP_WEB_SCRAPE_RATE_LIMITNoRate limiting threshold to prevent server overload.1000
MCP_WEB_SCRAPE_USER_AGENTNoCustom User-Agent string for identifying the scraper.MyBot/1.0

Capabilities

Features and capabilities supported by this server

CapabilityDetails
tools
{}
resources
{}

Tools

Functions exposed to the LLM to take actions

NameDescription
extract_contentB

Extract and clean content from a web page, returning Markdown with citation

summarize_contentC

Generate a summary of already extracted content

clear_cacheC

Clear cached content entries

get_page_metadataB

Extract meta tags, title, description, keywords from web pages

check_url_statusC

Check if URL is accessible and get HTTP status codes

extract_linksC

Extract all links from a web page with filtering options

extract_imagesC

Extract all images from a web page with metadata

search_contentB

Search for specific text patterns within extracted content

get_cache_statsC

Get detailed cache statistics and usage information

validate_robotsC

Check robots.txt compliance for specific URLs

extract_structured_dataC

Extract JSON-LD, microdata, and schema.org data

compare_contentC

Compare content between two URLs or cached versions

batch_extractC

Extract content from multiple URLs in a single operation

extract_formsC

Extract form elements and their structure from web pages

extract_tablesC

Extract and parse HTML tables with optional CSV export

extract_social_mediaC

Extract social media links and metadata from web pages

extract_contact_infoC

Extract contact information like emails, phones, addresses from web pages

extract_headingsB

Extract document structure and heading hierarchy from web pages

extract_feedsB

Discover and parse RSS/Atom feeds from web pages

monitor_changesC

Monitor web page content changes over time

analyze_performanceC

Analyze web page performance metrics

generate_sitemapC

Generate sitemap by crawling website pages

validate_htmlC

Validate HTML structure, accessibility, and SEO

convert_to_pdfB

Convert web page content to PDF format

extract_text_onlyB

Extract plain text content without any formatting or HTML

generate_word_cloudB

Generate word frequency analysis and word cloud data from web content

translate_contentB

Translate web page content to different languages

extract_keywordsB

Extract important keywords and phrases from web content

analyze_readabilityC

Analyze text readability using various metrics

detect_languageB

Detect the primary language of web page content

extract_entitiesB

Extract named entities (people, places, organizations) from web content

sentiment_analysisC

Analyze sentiment and emotional tone of web content

classify_contentC

Classify web content into categories and topics

analyze_competitorsC

Analyze competitor websites for SEO and content insights

extract_schema_markupC

Extract and validate schema.org structured data markup

check_broken_linksC

Check for broken links and redirects on web pages

analyze_page_speedC

Analyze page loading speed and performance metrics

generate_meta_tagsC

Generate optimized meta tags for SEO based on content analysis

scan_vulnerabilitiesC

Scan web pages for common security vulnerabilities

check_ssl_certificateC

Check SSL certificate validity and security details

analyze_cookiesB

Analyze cookies set by web pages for privacy and security

detect_trackingC

Detect tracking scripts and privacy-related elements

check_privacy_policyC

Analyze privacy policy content and compliance

monitor_uptimeC

Monitor website uptime and availability

track_changes_detailedC

Track detailed changes in web page content with diff analysis

analyze_traffic_patternsC

Analyze traffic patterns and user behavior indicators

benchmark_performanceC

Benchmark website performance against competitors and industry standards

generate_reportsC

Generate comprehensive reports combining multiple analysis tools

Prompts

Interactive templates invoked by user choice

NameDescription

No prompts

Resources

Contextual data attached and managed by the client

NameDescription

No resources

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/mukul975/mcp-web-scrape'

If you have feedback or need assistance with the MCP directory API, please join our Discord server