Skip to main content

Scrapy MCP Server

by ThreeFish-AI

Overview InspectNew Schema Related Servers Score

MIT License

1

Server Configuration

Describes the environment variables required to run the server.

Name	Required	Description	Default
`SCRAPY_MCP_PROXY_URL`	No	Proxy URL configuration
`SCRAPY_MCP_USE_PROXY`	No	Whether to use proxy	false
`SCRAPY_DOWNLOAD_DELAY`	No	Download delay setting	1.0
`SCRAPY_MCP_MAX_RETRIES`	No	Maximum number of retries	3
`SCRAPY_MCP_SERVER_NAME`	No	Server name setting	scrapy-mcp-server
`SCRAPY_MCP_SERVER_VERSION`	No	Server version setting	0.1.0
`SCRAPY_CONCURRENT_REQUESTS`	No	Number of concurrent requests	16
`SCRAPY_MCP_BROWSER_TIMEOUT`	No	Browser timeout setting	30
`SCRAPY_MCP_REQUEST_TIMEOUT`	No	Request timeout setting	30
`SCRAPY_MCP_BROWSER_HEADLESS`	No	Whether to run browser in headless mode	true
`SCRAPY_MCP_ENABLE_JAVASCRIPT`	No	Whether to enable JavaScript support	false
`SCRAPY_RANDOMIZE_DOWNLOAD_DELAY`	No	Whether to randomize download delay	true
`SCRAPY_MCP_USE_RANDOM_USER_AGENT`	No	Whether to use random user agents for anti-detection	true

Schema

Prompts

Interactive templates invoked by user choice

Name	Description
No prompts

Resources

Contextual data attached and managed by the client

Name	Description
No resources

Tools

Functions exposed to the LLM to take actions

Name	Description
scrape_webpage	Scrape a single webpage and extract its content. This tool can scrape web pages using different methods: auto: Automatically choose the best method simple: Fast HTTP requests (no JavaScript) scrapy: Robust scraping with Scrapy framework selenium: Full browser rendering (supports JavaScript) You can specify extraction rules to get specific data from the page.
scrape_multiple_webpages	Scrape multiple webpages concurrently. This tool allows you to scrape multiple URLs at once, which is much faster than scraping them one by one. All URLs will be processed concurrently.
extract_links	Extract all links from a webpage. This tool is specialized for link extraction and can filter links by domain, extract only internal links, or exclude specific domains.
get_page_info	Get basic information about a webpage (title, description, status). This is a lightweight tool for quickly checking page accessibility and getting basic metadata without full content extraction.
check_robots_txt	Check the robots.txt file for a domain to understand crawling permissions. This tool helps ensure ethical scraping by checking the robots.txt file of a website to see what crawling rules are in place.
scrape_with_stealth	Scrape a webpage using advanced stealth techniques to avoid detection. This tool uses sophisticated anti-detection methods including: Undetected browser automation Randomized behavior patterns Human-like interactions Advanced evasion techniques Use this for websites with strong anti-bot protection.
fill_and_submit_form	Fill and optionally submit a form on a webpage. This tool can handle various form elements including: Text inputs Checkboxes and radio buttons Dropdown selects File uploads Form submission Useful for interacting with search forms, contact forms, login forms, etc.
get_server_metrics	Get server performance metrics and statistics. Returns information about: Request counts and success rates Performance metrics Method usage statistics Error categories Cache statistics
clear_cache	Clear the scraping results cache. This removes all cached scraping results, forcing fresh requests for all subsequent scraping operations.
extract_structured_data	Extract structured data from a webpage using advanced techniques. Automatically detects and extracts: Contact information (emails, phone numbers) Social media links Addresses Prices and product information Article content data_type can be: all, contact, social, content, products, or addresses

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/ThreeFish-AI/scrapy-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server