WebScraping.AI MCP Server
A Model Context Protocol (MCP) server implementation that integrates with WebScraping.AI for web data extraction capabilities.
Features
- Question answering about web page content
- Structured data extraction from web pages
- HTML content retrieval with JavaScript rendering
- Plain text extraction from web pages
- CSS selector-based content extraction
- Multiple proxy types (datacenter, residential) with country selection
- JavaScript rendering using headless Chrome/Chromium
- Concurrent request management with rate limiting
- Custom JavaScript execution on target pages
- Device emulation (desktop, mobile, tablet)
- Account usage monitoring
Installation
Running with npx
Manual Installation
Configuring in Cursor
Note: Requires Cursor version 0.45.6+
The WebScraping.AI MCP server can be configured in two ways in Cursor:
- Project-specific Configuration (recommended for team projects):
Create a
.cursor/mcp.json
file in your project directory: - Global Configuration (for personal use across all projects):
Create a
~/.cursor/mcp.json
file in your home directory with the same configuration format as above.
If you are using Windows and are running into issues, try using
cmd /c "set WEBSCRAPING_AI_API_KEY=your-api-key && npx -y webscraping-ai-mcp"
as the command.
This configuration will make the WebScraping.AI tools available to Cursor's AI agent automatically when relevant for web scraping tasks.
Running on Claude Desktop
Add this to your claude_desktop_config.json
:
Configuration
Environment Variables
Required
WEBSCRAPING_AI_API_KEY
: Your WebScraping.AI API key- Required for all operations
- Get your API key from WebScraping.AI
Optional Configuration
WEBSCRAPING_AI_CONCURRENCY_LIMIT
: Maximum number of concurrent requests (default:5
)WEBSCRAPING_AI_DEFAULT_PROXY_TYPE
: Type of proxy to use (default:residential
)WEBSCRAPING_AI_DEFAULT_JS_RENDERING
: Enable/disable JavaScript rendering (default:true
)WEBSCRAPING_AI_DEFAULT_TIMEOUT
: Maximum web page retrieval time in ms (default:15000
, max:30000
)WEBSCRAPING_AI_DEFAULT_JS_TIMEOUT
: Maximum JavaScript rendering time in ms (default:2000
)
Configuration Examples
For standard usage:
Available Tools
1. Question Tool (webscraping_ai_question
)
Ask questions about web page content.
Example response:
2. Fields Tool (webscraping_ai_fields
)
Extract structured data from web pages based on instructions.
Example response:
3. HTML Tool (webscraping_ai_html
)
Get the full HTML of a web page with JavaScript rendering.
Example response:
4. Text Tool (webscraping_ai_text
)
Extract the visible text content from a web page.
Example response:
5. Selected Tool (webscraping_ai_selected
)
Extract content from a specific element using a CSS selector.
Example response:
6. Selected Multiple Tool (webscraping_ai_selected_multiple
)
Extract content from multiple elements using CSS selectors.
Example response:
7. Account Tool (webscraping_ai_account
)
Get information about your WebScraping.AI account.
Example response:
Common Options for All Tools
The following options can be used with all scraping tools:
timeout
: Maximum web page retrieval time in ms (15000 by default, maximum is 30000)js
: Execute on-page JavaScript using a headless browser (true by default)js_timeout
: Maximum JavaScript rendering time in ms (2000 by default)wait_for
: CSS selector to wait for before returning the page contentproxy
: Type of proxy, datacenter or residential (residential by default)country
: Country of the proxy to use (US by default). Supported countries: us, gb, de, it, fr, ca, es, ru, jp, kr, incustom_proxy
: Your own proxy URL in "http://user@host" formatdevice
: Type of device emulation. Supported values: desktop, mobile, tableterror_on_404
: Return error on 404 HTTP status on the target page (false by default)error_on_redirect
: Return error on redirect on the target page (false by default)js_script
: Custom JavaScript code to execute on the target page
Error Handling
The server provides robust error handling:
- Automatic retries for transient errors
- Rate limit handling with backoff
- Detailed error messages
- Network resilience
Example error response:
Integration with LLMs
This server implements the Model Context Protocol, making it compatible with any MCP-enabled LLM platforms. You can configure your LLM to use these tools for web scraping tasks.
Example: Configuring Claude with MCP
Development
Contributing
- Fork the repository
- Create your feature branch
- Run tests:
npm test
- Submit a pull request
License
MIT License - see LICENSE file for details
Related MCP Servers
- -securityAlicense-qualityEmpowers AI agents to perform web browsing, automation, and scraping tasks with minimal supervision using natural language instructions and Selenium.Last updated -4Apache 2.0
- Apache 2.0
- AsecurityAlicenseAqualityA powerful tool for fetching and extracting text content from web pages and APIs, supporting web scraping, REST API requests, and Google Custom Search integration.Last updated -48MIT License
- AsecurityFlicenseAqualityScrape documentation for libraries and API'sLast updated -11