Skip to main content
Glama
samirsaci

mcp-webscraper

by samirsaci

Server Configuration

Describes the environment variables required to run the server.

NameRequiredDescriptionDefault

No arguments

Capabilities

Features and capabilities supported by this server

CapabilityDetails
tools
{
  "listChanged": false
}
prompts
{
  "listChanged": false
}
resources
{
  "subscribe": false,
  "listChanged": false
}
experimental
{}

Tools

Functions exposed to the LLM to take actions

NameDescription
scrape_urlA
Scrape a webpage and return its HTML content.

Args:
    url: The webpage URL to scrape
    javascript: Set to True for JavaScript-rendered sites (slower but handles dynamic content)
    wait_seconds: How long to wait for JavaScript to load (only used when javascript=True)

Returns:
    Dictionary with html content, status code, and load time
extract_dataA
Scrape a webpage and extract specific data using CSS selectors.

Args:
    url: The webpage to scrape
    css_selectors: List of CSS selectors (e.g., ["h1", "a.link", "#content"])
    attributes: List of attributes to extract for each selector (e.g., ["text", "href", "text"])
               If not provided, defaults to "text" for all selectors
    javascript: Set to True for JavaScript-rendered sites

Returns:
    Dictionary with extracted data for each selector

Example:
    extract_data(
        url="https://example.com",
        css_selectors=["h1", "a"],
        attributes=["text", "href"]
    )
extract_firstA
Extract the first matching element from a webpage.
Useful for getting single values like page title, main heading, etc.

Args:
    url: The webpage to scrape
    css_selector: CSS selector for the element (e.g., "h1", "title", "meta[name='description']")
    attribute: What to extract - "text" for content, or attribute name like "href", "content", "src"
    javascript: Set to True for JavaScript-rendered sites

Returns:
    Dictionary with the extracted value

Example:
    extract_first(url="https://example.com", css_selector="title", attribute="text")
batch_scrapeB
Scrape multiple URLs efficiently.

Args:
    urls: List of URLs to scrape
    javascript: Set to True if the sites need JavaScript rendering

Returns:
    List of scraping results for each URL
crawl_websiteA
Crawl a website to discover its structure and pages.

Args:
    start_url: Starting URL
    max_pages: Maximum pages to crawl (default 50)
    max_depth: Maximum link depth (default 3)
    same_domain_only: Stay on same domain (default True)

Returns:
    Site map with discovered pages and statistics

Prompts

Interactive templates invoked by user choice

NameDescription

No prompts

Resources

Contextual data attached and managed by the client

NameDescription
get_helpGet help documentation for the web scraping tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/samirsaci/mcp-webscraper'

If you have feedback or need assistance with the MCP directory API, please join our Discord server