oxylabs_scraper
Extract web content using Oxylabs Web Scraper API, enabling customizable parsing and rendering for efficient data retrieval from complex websites.
Instructions
Scrape url using Oxylabs Web Api
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| parse | No | Should result be parsed. If result should not be parsed then html will be stripped and converted to markdown file | |
| render | No | Whether a headless browser should be used to render the page. See: https://developers.oxylabs.io/scraper-apis/web-scraper-api/features/javascript-rendering `html` will return rendered html page `None` will not use render for scraping. | |
| url | Yes | Url to scrape |
Implementation Reference
- src/oxylabs_mcp/config.py:13-13 (helper)Configuration defining the Oxylabs scraper API endpoint URL.OXYLABS_SCRAPER_URL: str = "https://realtime.oxylabs.io/v1/queries"
- src/oxylabs_mcp/utils.py:152-168 (helper)Implementation of the scrape method in _OxylabsClientWrapper class, which sends POST request to Oxylabs scraper API with payload and handles response.async def scrape(self, payload: dict[str, typing.Any]) -> dict[str, typing.Any]: await self._ctx.info(f"Create job with params: {json.dumps(payload)}") response = await self._client.post(settings.OXYLABS_SCRAPER_URL, json=payload) response_json: dict[str, typing.Any] = response.json() if response.status_code == status.HTTP_201_CREATED: await self._ctx.info( f"Job info: " f"job_id={response_json['job']['id']} " f"job_status={response_json['job']['status']}" ) response.raise_for_status() return response_json
- src/oxylabs_mcp/utils.py:201-229 (helper)Async context manager providing the Oxylabs HTTP client wrapper used by scraper tools.@asynccontextmanager async def oxylabs_client() -> AsyncIterator[_OxylabsClientWrapper]: """Async context manager for Oxylabs client that is used in MCP tools.""" headers = _get_default_headers() username, password = get_oxylabs_auth() if not username or not password: raise ValueError("Oxylabs username and password must be set.") auth = BasicAuth(username=username, password=password) async with AsyncClient( timeout=Timeout(settings.OXYLABS_REQUEST_TIMEOUT_S), verify=True, headers=headers, auth=auth, ) as client: try: yield _OxylabsClientWrapper(client) except HTTPStatusError as e: raise MCPServerError( f"HTTP error during POST request: {e.response.status_code} - {e.response.text}" ) from None except RequestError as e: raise MCPServerError(f"Request error during POST request: {e}") from None except Exception as e: raise MCPServerError(f"Error: {str(e) or repr(e)}") from None