remote-capable server
The server can be hosted and run remotely because it primarily relies on remote services or has no dependency on the local environment.
Integrations
Enables custom JavaScript execution on target web pages, including headless Chrome/Chromium rendering and the ability to run custom JS scripts with configurable timeout settings.
WebScraping.AI MCP Server
A Model Context Protocol (MCP) server implementation that integrates with WebScraping.AI for web data extraction capabilities.
Features
- Question answering about web page content
- Structured data extraction from web pages
- HTML content retrieval with JavaScript rendering
- Plain text extraction from web pages
- CSS selector-based content extraction
- Multiple proxy types (datacenter, residential) with country selection
- JavaScript rendering using headless Chrome/Chromium
- Concurrent request management with rate limiting
- Custom JavaScript execution on target pages
- Device emulation (desktop, mobile, tablet)
- Account usage monitoring
Installation
Running with npx
Manual Installation
Configuring in Cursor
Note: Requires Cursor version 0.45.6+
The WebScraping.AI MCP server can be configured in two ways in Cursor:
- Project-specific Configuration (recommended for team projects):
Create a
.cursor/mcp.json
file in your project directory:Copy - Global Configuration (for personal use across all projects):
Create a
~/.cursor/mcp.json
file in your home directory with the same configuration format as above.
If you are using Windows and are running into issues, try using
cmd /c "set WEBSCRAPING_AI_API_KEY=your-api-key && npx -y webscraping-ai-mcp"
as the command.
This configuration will make the WebScraping.AI tools available to Cursor's AI agent automatically when relevant for web scraping tasks.
Running on Claude Desktop
Add this to your claude_desktop_config.json
:
Configuration
Environment Variables
Required
WEBSCRAPING_AI_API_KEY
: Your WebScraping.AI API key- Required for all operations
- Get your API key from WebScraping.AI
Optional Configuration
WEBSCRAPING_AI_CONCURRENCY_LIMIT
: Maximum number of concurrent requests (default:5
)WEBSCRAPING_AI_DEFAULT_PROXY_TYPE
: Type of proxy to use (default:residential
)WEBSCRAPING_AI_DEFAULT_JS_RENDERING
: Enable/disable JavaScript rendering (default:true
)WEBSCRAPING_AI_DEFAULT_TIMEOUT
: Maximum web page retrieval time in ms (default:15000
, max:30000
)WEBSCRAPING_AI_DEFAULT_JS_TIMEOUT
: Maximum JavaScript rendering time in ms (default:2000
)
Configuration Examples
For standard usage:
Available Tools
1. Question Tool (webscraping_ai_question
)
Ask questions about web page content.
Example response:
2. Fields Tool (webscraping_ai_fields
)
Extract structured data from web pages based on instructions.
Example response:
3. HTML Tool (webscraping_ai_html
)
Get the full HTML of a web page with JavaScript rendering.
Example response:
4. Text Tool (webscraping_ai_text
)
Extract the visible text content from a web page.
Example response:
5. Selected Tool (webscraping_ai_selected
)
Extract content from a specific element using a CSS selector.
Example response:
6. Selected Multiple Tool (webscraping_ai_selected_multiple
)
Extract content from multiple elements using CSS selectors.
Example response:
7. Account Tool (webscraping_ai_account
)
Get information about your WebScraping.AI account.
Example response:
Common Options for All Tools
The following options can be used with all scraping tools:
timeout
: Maximum web page retrieval time in ms (15000 by default, maximum is 30000)js
: Execute on-page JavaScript using a headless browser (true by default)js_timeout
: Maximum JavaScript rendering time in ms (2000 by default)wait_for
: CSS selector to wait for before returning the page contentproxy
: Type of proxy, datacenter or residential (residential by default)country
: Country of the proxy to use (US by default). Supported countries: us, gb, de, it, fr, ca, es, ru, jp, kr, incustom_proxy
: Your own proxy URL in "http://user:password@host:port" formatdevice
: Type of device emulation. Supported values: desktop, mobile, tableterror_on_404
: Return error on 404 HTTP status on the target page (false by default)error_on_redirect
: Return error on redirect on the target page (false by default)js_script
: Custom JavaScript code to execute on the target page
Error Handling
The server provides robust error handling:
- Automatic retries for transient errors
- Rate limit handling with backoff
- Detailed error messages
- Network resilience
Example error response:
Integration with LLMs
This server implements the Model Context Protocol, making it compatible with any MCP-enabled LLM platforms. You can configure your LLM to use these tools for web scraping tasks.
Example: Configuring Claude with MCP
Development
Contributing
- Fork the repository
- Create your feature branch
- Run tests:
npm test
- Submit a pull request
License
MIT License - see LICENSE file for details
You must be authenticated.
Interact with WebScraping.AI API for web data extraction and scraping