Server Configuration
Describes the environment variables required to run the server.
Name | Required | Description | Default |
---|---|---|---|
SCRAPY_MCP_PROXY_URL | No | Proxy URL configuration | |
SCRAPY_MCP_USE_PROXY | No | Whether to use proxy | false |
SCRAPY_DOWNLOAD_DELAY | No | Download delay setting | 1.0 |
SCRAPY_MCP_MAX_RETRIES | No | Maximum number of retries | 3 |
SCRAPY_MCP_SERVER_NAME | No | Server name setting | scrapy-mcp-server |
SCRAPY_MCP_SERVER_VERSION | No | Server version setting | 0.1.0 |
SCRAPY_CONCURRENT_REQUESTS | No | Number of concurrent requests | 16 |
SCRAPY_MCP_BROWSER_TIMEOUT | No | Browser timeout setting | 30 |
SCRAPY_MCP_REQUEST_TIMEOUT | No | Request timeout setting | 30 |
SCRAPY_MCP_BROWSER_HEADLESS | No | Whether to run browser in headless mode | true |
SCRAPY_MCP_ENABLE_JAVASCRIPT | No | Whether to enable JavaScript support | false |
SCRAPY_RANDOMIZE_DOWNLOAD_DELAY | No | Whether to randomize download delay | true |
SCRAPY_MCP_USE_RANDOM_USER_AGENT | No | Whether to use random user agents for anti-detection | true |
Schema
Prompts
Interactive templates invoked by user choice
Name | Description |
---|---|
No prompts |
Resources
Contextual data attached and managed by the client
Name | Description |
---|---|
No resources |
Tools
Functions exposed to the LLM to take actions
Name | Description |
---|---|
scrape_webpage | Scrape a single webpage and extract its content. This tool can scrape web pages using different methods:
You can specify extraction rules to get specific data from the page. |
scrape_multiple_webpages | Scrape multiple webpages concurrently. This tool allows you to scrape multiple URLs at once, which is much faster than scraping them one by one. All URLs will be processed concurrently. |
extract_links | Extract all links from a webpage. This tool is specialized for link extraction and can filter links by domain, extract only internal links, or exclude specific domains. |
get_page_info | Get basic information about a webpage (title, description, status). This is a lightweight tool for quickly checking page accessibility and getting basic metadata without full content extraction. |
check_robots_txt | Check the robots.txt file for a domain to understand crawling permissions. This tool helps ensure ethical scraping by checking the robots.txt file of a website to see what crawling rules are in place. |
scrape_with_stealth | Scrape a webpage using advanced stealth techniques to avoid detection. This tool uses sophisticated anti-detection methods including:
Use this for websites with strong anti-bot protection. |
fill_and_submit_form | Fill and optionally submit a form on a webpage. This tool can handle various form elements including:
Useful for interacting with search forms, contact forms, login forms, etc. |
get_server_metrics | Get server performance metrics and statistics. Returns information about:
|
clear_cache | Clear the scraping results cache. This removes all cached scraping results, forcing fresh requests for all subsequent scraping operations. |
extract_structured_data | Extract structured data from a webpage using advanced techniques. Automatically detects and extracts:
data_type can be: all, contact, social, content, products, or addresses |