google_search_scraper
Scrape Google search results with customizable pagination, geolocation, locale, and output formats. Parses content into structured data for efficient extraction.
Instructions
Scrape Google Search results.
Supports content parsing, different user agent types, pagination, domain, geolocation, locale parameters and different output formats.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| query | Yes | URL-encoded keyword to search for. | |
| parse | No | Should result be parsed. If the result is not parsed, the output_format parameter is applied. | |
| render | No | Whether a headless browser should be used to render the page. For example: - 'html' when browser is required to render the page. | |
| user_agent_type | No | Device type and browser that will be used to determine User-Agent header value. | |
| start_page | No | Starting page number. | |
| pages | No | Number of pages to retrieve. | |
| limit | No | Number of results to retrieve in each page. | |
| domain | No | Domain localization for Google. Use country top level domains. For example: - 'co.uk' for United Kingdom - 'us' for United States - 'fr' for France | |
| geo_location | No | The geographical location that the result should be adapted for. Use ISO-3166 country codes. Examples: - 'California, United States' - 'Mexico' - 'US' for United States - 'DE' for Germany - 'FR' for France | |
| locale | No | Set 'Accept-Language' header value which changes your Google search page web interface language. Examples: - 'en-US' for English, United States - 'de-AT' for German, Austria - 'fr-FR' for French, France | |
| ad_mode | No | If true will use the Google Ads source optimized for the paid ads. | |
| output_format | No | The format of the output. Works only when parse parameter is false. - links - Most efficient when the goal is navigation or finding specific URLs. Use this first when you need to locate a specific page within a website. - md - Best for extracting and reading visible content once you've found the right page. Use this to get structured content that's easy to read and process. - html - Should be used sparingly only when you need the raw HTML structure, JavaScript code, or styling information. |
Output Schema
| Name | Required | Description | Default |
|---|---|---|---|
| result | Yes |
Implementation Reference
- src/oxylabs_mcp/tools/scraper.py:55-107 (handler)Main handler function for the google_search_scraper tool. Sends a scraping request to Oxylabs API with query, source (google_search or google_ads), and optional parameters (parse, render, user_agent_type, start_page, pages, limit, domain, geo_location, locale). Returns parsed content via get_content().
@mcp.tool(annotations=ToolAnnotations(readOnlyHint=True)) async def google_search_scraper( query: url_params.GOOGLE_QUERY_PARAM, parse: url_params.PARSE_PARAM = True, # noqa: FBT002 render: url_params.RENDER_PARAM = None, user_agent_type: url_params.USER_AGENT_TYPE_PARAM = None, start_page: url_params.START_PAGE_PARAM = 0, pages: url_params.PAGES_PARAM = 0, limit: url_params.LIMIT_PARAM = 0, domain: url_params.DOMAIN_PARAM = None, geo_location: url_params.GEO_LOCATION_PARAM = None, locale: url_params.LOCALE_PARAM = None, ad_mode: url_params.AD_MODE_PARAM = False, # noqa: FBT002 output_format: url_params.OUTPUT_FORMAT_PARAM = None, ) -> str: """Scrape Google Search results. Supports content parsing, different user agent types, pagination, domain, geolocation, locale parameters and different output formats. """ try: async with oxylabs_client() as client: payload: dict[str, Any] = {"query": query} if ad_mode: payload["source"] = "google_ads" else: payload["source"] = "google_search" if parse: payload["parse"] = parse if render: payload["render"] = render if user_agent_type: payload["user_agent_type"] = user_agent_type if start_page: payload["start_page"] = start_page if pages: payload["pages"] = pages if limit: payload["limit"] = limit if domain: payload["domain"] = domain if geo_location: payload["geo_location"] = geo_location if locale: payload["locale"] = locale response_json = await client.scrape(payload) return get_content(response_json, parse=parse, output_format=output_format) except MCPServerError as e: return await e.process() - src/oxylabs_mcp/url_params.py:40-147 (schema)Type definitions (Pydantic Fields) for all google_search_scraper parameters: GOOGLE_QUERY_PARAM, PARSE_PARAM, RENDER_PARAM, USER_AGENT_TYPE_PARAM, START_PAGE_PARAM, PAGES_PARAM, LIMIT_PARAM, DOMAIN_PARAM, GEO_LOCATION_PARAM, LOCALE_PARAM, AD_MODE_PARAM, OUTPUT_FORMAT_PARAM.
GOOGLE_QUERY_PARAM = Annotated[str, Field(description="URL-encoded keyword to search for.")] AMAZON_SEARCH_QUERY_PARAM = Annotated[str, Field(description="Keyword to search for.")] USER_AGENT_TYPE_PARAM = Annotated[ Literal[ "desktop", "desktop_chrome", "desktop_firefox", "desktop_safari", "desktop_edge", "desktop_opera", "mobile", "mobile_ios", "mobile_android", "tablet", ] | None, Field( description="Device type and browser that will be used to " "determine User-Agent header value." ), ] START_PAGE_PARAM = Annotated[ int, Field(description="Starting page number."), ] PAGES_PARAM = Annotated[ int, Field(description="Number of pages to retrieve."), ] LIMIT_PARAM = Annotated[ int, Field(description="Number of results to retrieve in each page."), ] DOMAIN_PARAM = Annotated[ str | None, Field( description=""" Domain localization for Google. Use country top level domains. For example: - 'co.uk' for United Kingdom - 'us' for United States - 'fr' for France """, examples=["uk", "us", "fr"], ), ] GEO_LOCATION_PARAM = Annotated[ str | None, Field( description=""" The geographical location that the result should be adapted for. Use ISO-3166 country codes. Examples: - 'California, United States' - 'Mexico' - 'US' for United States - 'DE' for Germany - 'FR' for France """, examples=["US", "DE", "FR"], ), ] LOCALE_PARAM = Annotated[ str | None, Field( description=""" Set 'Accept-Language' header value which changes your Google search page web interface language. Examples: - 'en-US' for English, United States - 'de-AT' for German, Austria - 'fr-FR' for French, France """, examples=["en-US", "de-AT", "fr-FR"], ), ] AD_MODE_PARAM = Annotated[ bool, Field( description="If true will use the Google Ads source optimized for the paid ads.", ), ] CATEGORY_ID_CONTEXT_PARAM = Annotated[ str | None, Field( description="Search for items in a particular browse node (product category).", ), ] MERCHANT_ID_CONTEXT_PARAM = Annotated[ str | None, Field( description="Search for items sold by a particular seller.", ), ] CURRENCY_CONTEXT_PARAM = Annotated[ str | None, Field( description="Currency that will be used to display the prices.", examples=["USD", "EUR", "AUD"], ), ] AUTOSELECT_VARIANT_CONTEXT_PARAM = Annotated[ bool, Field( description="To get accurate pricing/buybox data, set this parameter to true.", ), ] - src/oxylabs_mcp/tools/scraper.py:14-19 (registration)Tool name is listed in SCRAPER_TOOLS list and registered via @mcp.tool decorator. The 'mcp' instance is mounted in __init__.py via mcp.mount(scraper_mcp).
SCRAPER_TOOLS = [ "universal_scraper", "google_search_scraper", "amazon_search_scraper", "amazon_product_scraper", ] - src/oxylabs_mcp/utils.py:288-306 (helper)Helper function called by google_search_scraper to extract content from the API response. Supports parsing (returns JSON), raw HTML, links extraction, and Markdown conversion.
def get_content( response_json: dict[str, typing.Any], *, output_format: str | None, parse: bool = False, ) -> str: """Extract content from response and convert to a proper format.""" content = response_json["results"][0]["content"] if parse and isinstance(content, dict): return json.dumps(content) if output_format == "html": return str(content) if output_format == "links": links = extract_links_with_text(str(content)) return "\n".join(links) stripped_html = clean_html(str(content)) return markdownify(stripped_html) - src/oxylabs_mcp/utils.py:201-228 (helper)Async context manager providing the HTTP client used by google_search_scraper to make API requests. Handles authentication, headers, timeout, and error wrapping into MCPServerError.
@asynccontextmanager async def oxylabs_client() -> AsyncIterator[_OxylabsClientWrapper]: """Async context manager for Oxylabs client that is used in MCP tools.""" headers = _get_default_headers() username, password = get_oxylabs_auth() if not username or not password: raise ValueError("Oxylabs username and password must be set.") auth = BasicAuth(username=username, password=password) async with AsyncClient( timeout=Timeout(settings.OXYLABS_REQUEST_TIMEOUT_S), verify=True, headers=headers, auth=auth, ) as client: try: yield _OxylabsClientWrapper(client) except HTTPStatusError as e: raise MCPServerError( f"HTTP error during POST request: {e.response.status_code} - {e.response.text}" ) from None except RequestError as e: raise MCPServerError(f"Request error during POST request: {e}") from None except Exception as e: raise MCPServerError(f"Error: {str(e) or repr(e)}") from None