Skip to main content
Glama
oxylabs

Oxylabs MCP Server

Official
by oxylabs

google_search_scraper

Read-only

Scrape Google search results with customizable pagination, geolocation, locale, and output formats. Parses content into structured data for efficient extraction.

Instructions

Scrape Google Search results.

Supports content parsing, different user agent types, pagination, domain, geolocation, locale parameters and different output formats.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
queryYesURL-encoded keyword to search for.
parseNoShould result be parsed. If the result is not parsed, the output_format parameter is applied.
renderNo Whether a headless browser should be used to render the page. For example: - 'html' when browser is required to render the page.
user_agent_typeNoDevice type and browser that will be used to determine User-Agent header value.
start_pageNoStarting page number.
pagesNoNumber of pages to retrieve.
limitNoNumber of results to retrieve in each page.
domainNo Domain localization for Google. Use country top level domains. For example: - 'co.uk' for United Kingdom - 'us' for United States - 'fr' for France
geo_locationNo The geographical location that the result should be adapted for. Use ISO-3166 country codes. Examples: - 'California, United States' - 'Mexico' - 'US' for United States - 'DE' for Germany - 'FR' for France
localeNo Set 'Accept-Language' header value which changes your Google search page web interface language. Examples: - 'en-US' for English, United States - 'de-AT' for German, Austria - 'fr-FR' for French, France
ad_modeNoIf true will use the Google Ads source optimized for the paid ads.
output_formatNo The format of the output. Works only when parse parameter is false. - links - Most efficient when the goal is navigation or finding specific URLs. Use this first when you need to locate a specific page within a website. - md - Best for extracting and reading visible content once you've found the right page. Use this to get structured content that's easy to read and process. - html - Should be used sparingly only when you need the raw HTML structure, JavaScript code, or styling information.

Output Schema

TableJSON Schema
NameRequiredDescriptionDefault
resultYes

Implementation Reference

  • Main handler function for the google_search_scraper tool. Sends a scraping request to Oxylabs API with query, source (google_search or google_ads), and optional parameters (parse, render, user_agent_type, start_page, pages, limit, domain, geo_location, locale). Returns parsed content via get_content().
    @mcp.tool(annotations=ToolAnnotations(readOnlyHint=True))
    async def google_search_scraper(
        query: url_params.GOOGLE_QUERY_PARAM,
        parse: url_params.PARSE_PARAM = True,  # noqa: FBT002
        render: url_params.RENDER_PARAM = None,
        user_agent_type: url_params.USER_AGENT_TYPE_PARAM = None,
        start_page: url_params.START_PAGE_PARAM = 0,
        pages: url_params.PAGES_PARAM = 0,
        limit: url_params.LIMIT_PARAM = 0,
        domain: url_params.DOMAIN_PARAM = None,
        geo_location: url_params.GEO_LOCATION_PARAM = None,
        locale: url_params.LOCALE_PARAM = None,
        ad_mode: url_params.AD_MODE_PARAM = False,  # noqa: FBT002
        output_format: url_params.OUTPUT_FORMAT_PARAM = None,
    ) -> str:
        """Scrape Google Search results.
    
        Supports content parsing, different user agent types, pagination,
        domain, geolocation, locale parameters and different output formats.
        """
        try:
            async with oxylabs_client() as client:
                payload: dict[str, Any] = {"query": query}
    
                if ad_mode:
                    payload["source"] = "google_ads"
                else:
                    payload["source"] = "google_search"
    
                if parse:
                    payload["parse"] = parse
                if render:
                    payload["render"] = render
                if user_agent_type:
                    payload["user_agent_type"] = user_agent_type
                if start_page:
                    payload["start_page"] = start_page
                if pages:
                    payload["pages"] = pages
                if limit:
                    payload["limit"] = limit
                if domain:
                    payload["domain"] = domain
                if geo_location:
                    payload["geo_location"] = geo_location
                if locale:
                    payload["locale"] = locale
    
                response_json = await client.scrape(payload)
    
                return get_content(response_json, parse=parse, output_format=output_format)
        except MCPServerError as e:
            return await e.process()
  • Type definitions (Pydantic Fields) for all google_search_scraper parameters: GOOGLE_QUERY_PARAM, PARSE_PARAM, RENDER_PARAM, USER_AGENT_TYPE_PARAM, START_PAGE_PARAM, PAGES_PARAM, LIMIT_PARAM, DOMAIN_PARAM, GEO_LOCATION_PARAM, LOCALE_PARAM, AD_MODE_PARAM, OUTPUT_FORMAT_PARAM.
    GOOGLE_QUERY_PARAM = Annotated[str, Field(description="URL-encoded keyword to search for.")]
    AMAZON_SEARCH_QUERY_PARAM = Annotated[str, Field(description="Keyword to search for.")]
    USER_AGENT_TYPE_PARAM = Annotated[
        Literal[
            "desktop",
            "desktop_chrome",
            "desktop_firefox",
            "desktop_safari",
            "desktop_edge",
            "desktop_opera",
            "mobile",
            "mobile_ios",
            "mobile_android",
            "tablet",
        ]
        | None,
        Field(
            description="Device type and browser that will be used to "
            "determine User-Agent header value."
        ),
    ]
    START_PAGE_PARAM = Annotated[
        int,
        Field(description="Starting page number."),
    ]
    PAGES_PARAM = Annotated[
        int,
        Field(description="Number of pages to retrieve."),
    ]
    LIMIT_PARAM = Annotated[
        int,
        Field(description="Number of results to retrieve in each page."),
    ]
    DOMAIN_PARAM = Annotated[
        str | None,
        Field(
            description="""
            Domain localization for Google.
            Use country top level domains.
            For example:
                - 'co.uk' for United Kingdom
                - 'us' for United States
                - 'fr' for France
            """,
            examples=["uk", "us", "fr"],
        ),
    ]
    GEO_LOCATION_PARAM = Annotated[
        str | None,
        Field(
            description="""
            The geographical location that the result should be adapted for.
            Use ISO-3166 country codes.
            Examples:
                - 'California, United States'
                - 'Mexico'
                - 'US' for United States
                - 'DE' for Germany
                - 'FR' for France
            """,
            examples=["US", "DE", "FR"],
        ),
    ]
    LOCALE_PARAM = Annotated[
        str | None,
        Field(
            description="""
            Set 'Accept-Language' header value which changes your Google search page web interface language.
            Examples:
                - 'en-US' for English, United States
                - 'de-AT' for German, Austria
                - 'fr-FR' for French, France
            """,
            examples=["en-US", "de-AT", "fr-FR"],
        ),
    ]
    AD_MODE_PARAM = Annotated[
        bool,
        Field(
            description="If true will use the Google Ads source optimized for the paid ads.",
        ),
    ]
    CATEGORY_ID_CONTEXT_PARAM = Annotated[
        str | None,
        Field(
            description="Search for items in a particular browse node (product category).",
        ),
    ]
    MERCHANT_ID_CONTEXT_PARAM = Annotated[
        str | None,
        Field(
            description="Search for items sold by a particular seller.",
        ),
    ]
    CURRENCY_CONTEXT_PARAM = Annotated[
        str | None,
        Field(
            description="Currency that will be used to display the prices.",
            examples=["USD", "EUR", "AUD"],
        ),
    ]
    AUTOSELECT_VARIANT_CONTEXT_PARAM = Annotated[
        bool,
        Field(
            description="To get accurate pricing/buybox data, set this parameter to true.",
        ),
    ]
  • Tool name is listed in SCRAPER_TOOLS list and registered via @mcp.tool decorator. The 'mcp' instance is mounted in __init__.py via mcp.mount(scraper_mcp).
    SCRAPER_TOOLS = [
        "universal_scraper",
        "google_search_scraper",
        "amazon_search_scraper",
        "amazon_product_scraper",
    ]
  • Helper function called by google_search_scraper to extract content from the API response. Supports parsing (returns JSON), raw HTML, links extraction, and Markdown conversion.
    def get_content(
        response_json: dict[str, typing.Any],
        *,
        output_format: str | None,
        parse: bool = False,
    ) -> str:
        """Extract content from response and convert to a proper format."""
        content = response_json["results"][0]["content"]
        if parse and isinstance(content, dict):
            return json.dumps(content)
        if output_format == "html":
            return str(content)
        if output_format == "links":
            links = extract_links_with_text(str(content))
            return "\n".join(links)
    
        stripped_html = clean_html(str(content))
        return markdownify(stripped_html)
  • Async context manager providing the HTTP client used by google_search_scraper to make API requests. Handles authentication, headers, timeout, and error wrapping into MCPServerError.
    @asynccontextmanager
    async def oxylabs_client() -> AsyncIterator[_OxylabsClientWrapper]:
        """Async context manager for Oxylabs client that is used in MCP tools."""
        headers = _get_default_headers()
    
        username, password = get_oxylabs_auth()
    
        if not username or not password:
            raise ValueError("Oxylabs username and password must be set.")
    
        auth = BasicAuth(username=username, password=password)
    
        async with AsyncClient(
            timeout=Timeout(settings.OXYLABS_REQUEST_TIMEOUT_S),
            verify=True,
            headers=headers,
            auth=auth,
        ) as client:
            try:
                yield _OxylabsClientWrapper(client)
            except HTTPStatusError as e:
                raise MCPServerError(
                    f"HTTP error during POST request: {e.response.status_code} - {e.response.text}"
                ) from None
            except RequestError as e:
                raise MCPServerError(f"Request error during POST request: {e}") from None
            except Exception as e:
                raise MCPServerError(f"Error: {str(e) or repr(e)}") from None
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already provide readOnlyHint=true, so the description is consistent. It adds no additional behavioral traits (e.g., rate limits, permissions, side effects) beyond listing parameters. It does not contradict annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences: purpose first, then feature list. No wasted words, efficient and front-loaded. Ideal for quick comprehension.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

With 12 parameters and output schema, the description provides a high-level overview covering major capabilities. It could mention interaction between parameters or typical usage flow, but the schema and output schema fill details. Adequate for complexity.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, meaning all parameters are documented in the input schema. The tool description lists parameter categories but adds no new meaning beyond the schema descriptions. Baseline met.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'Scrape Google Search results' with a specific verb and resource. It lists supported features (parsing, pagination, domain, etc.) that differentiate it from siblings like amazon_product_scraper and universal_scraper. No confusion about what the tool does.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description does not explicitly state when to use this tool or when to avoid it. It only lists features, implying usage for Google Search. Sibling names hint at different sources, but no direct comparison or exclusions are provided.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/oxylabs/oxylabs-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server