Skip to main content
Glama

Scrapy MCP Server

by ThreeFish-AI

Server Configuration

Describes the environment variables required to run the server.

NameRequiredDescriptionDefault
SCRAPY_MCP_PROXY_URLNoProxy URL configuration
SCRAPY_MCP_USE_PROXYNoWhether to use proxyfalse
SCRAPY_DOWNLOAD_DELAYNoDownload delay setting1.0
SCRAPY_MCP_MAX_RETRIESNoMaximum number of retries3
SCRAPY_MCP_SERVER_NAMENoServer name settingscrapy-mcp-server
SCRAPY_MCP_SERVER_VERSIONNoServer version setting0.1.0
SCRAPY_CONCURRENT_REQUESTSNoNumber of concurrent requests16
SCRAPY_MCP_BROWSER_TIMEOUTNoBrowser timeout setting30
SCRAPY_MCP_REQUEST_TIMEOUTNoRequest timeout setting30
SCRAPY_MCP_BROWSER_HEADLESSNoWhether to run browser in headless modetrue
SCRAPY_MCP_ENABLE_JAVASCRIPTNoWhether to enable JavaScript supportfalse
SCRAPY_RANDOMIZE_DOWNLOAD_DELAYNoWhether to randomize download delaytrue
SCRAPY_MCP_USE_RANDOM_USER_AGENTNoWhether to use random user agents for anti-detectiontrue

Schema

Prompts

Interactive templates invoked by user choice

NameDescription

No prompts

Resources

Contextual data attached and managed by the client

NameDescription

No resources

Tools

Functions exposed to the LLM to take actions

NameDescription
scrape_webpage

Scrape a single webpage and extract its content.

This tool can scrape web pages using different methods:

  • auto: Automatically choose the best method

  • simple: Fast HTTP requests (no JavaScript)

  • scrapy: Robust scraping with Scrapy framework

  • selenium: Full browser rendering (supports JavaScript)

You can specify extraction rules to get specific data from the page.

scrape_multiple_webpages

Scrape multiple webpages concurrently.

This tool allows you to scrape multiple URLs at once, which is much faster than scraping them one by one. All URLs will be processed concurrently.

extract_links

Extract all links from a webpage.

This tool is specialized for link extraction and can filter links by domain, extract only internal links, or exclude specific domains.

get_page_info

Get basic information about a webpage (title, description, status).

This is a lightweight tool for quickly checking page accessibility and getting basic metadata without full content extraction.

check_robots_txt

Check the robots.txt file for a domain to understand crawling permissions.

This tool helps ensure ethical scraping by checking the robots.txt file of a website to see what crawling rules are in place.

scrape_with_stealth

Scrape a webpage using advanced stealth techniques to avoid detection.

This tool uses sophisticated anti-detection methods including:

  • Undetected browser automation

  • Randomized behavior patterns

  • Human-like interactions

  • Advanced evasion techniques

Use this for websites with strong anti-bot protection.

fill_and_submit_form

Fill and optionally submit a form on a webpage.

This tool can handle various form elements including:

  • Text inputs

  • Checkboxes and radio buttons

  • Dropdown selects

  • File uploads

  • Form submission

Useful for interacting with search forms, contact forms, login forms, etc.

get_server_metrics

Get server performance metrics and statistics.

Returns information about:

  • Request counts and success rates

  • Performance metrics

  • Method usage statistics

  • Error categories

  • Cache statistics

clear_cache

Clear the scraping results cache.

This removes all cached scraping results, forcing fresh requests for all subsequent scraping operations.

extract_structured_data

Extract structured data from a webpage using advanced techniques.

Automatically detects and extracts:

  • Contact information (emails, phone numbers)

  • Social media links

  • Addresses

  • Prices and product information

  • Article content

data_type can be: all, contact, social, content, products, or addresses

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/ThreeFish-AI/scrapy-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server