Bright Data Web MCP

web_data_amazon_product_search

Retrieve structured Amazon product search data using keywords and domain URLs for market research and analysis.

Instructions

Quickly read structured amazon product search data. Requires a valid search keyword and amazon domain URL. This can be a cache lookup, so it can be more reliable than scraping

Input Schema

TableJSON Schema

Name	Required	Default
`keyword`	Yes
`url`	Yes
`pages_to_search`	No	1

Implementation Reference

server.js:296-305 (registration)

Dataset configuration defining the 'amazon_product_search' id (used for tool name 'web_data_amazon_product_search'), BrightData dataset_id, description, input parameters (keyword, url, pages_to_search with default), used for registration and schema.

    id: 'amazon_product_search',
    dataset_id: 'gd_lwdb4vjm1ehb499uxs',
    description: [
        'Quickly read structured amazon product search data.',
        'Requires a valid search keyword and amazon domain URL.',
        'This can be a cache lookup, so it can be more reliable than scraping',
    ].join('\n'),
    inputs: ['keyword', 'url', 'pages_to_search'],
    defaults: {pages_to_search: '1'},
}, {

server.js:683-686 (schema)
Dynamic schema and registration snippet showing tool name construction `web_data_${id}` where id='amazon_product_search', and parameters schema built from inputs array as z.object() with Zod validators.
```
addTool({
    name: `web_data_${id}`,
    description,
    parameters: z.object(parameters),
```

server.js:687-743 (handler)

Core handler logic for the tool: POSTs input data to BrightData datasets/v3/trigger using specific dataset_id 'gd_lwdb4vjm1ehb499uxs', polls /snapshot/{id} up to 600s until ready, returns JSON.stringify(response.data). Interpolates tool-specific name and dataset_id.

execute: tool_fn(`web_data_${id}`, async(data, ctx)=>{
    let trigger_response = await axios({
        url: 'https://api.brightdata.com/datasets/v3/trigger',
        params: {dataset_id, include_errors: true},
        method: 'POST',
        data: [data],
        headers: api_headers(),
    });
    if (!trigger_response.data?.snapshot_id)
        throw new Error('No snapshot ID returned from request');
    let snapshot_id = trigger_response.data.snapshot_id;
    console.error(`[web_data_${id}] triggered collection with `
        +`snapshot ID: ${snapshot_id}`);
    let max_attempts = 600;
    let attempts = 0;
    while (attempts < max_attempts)
    {
        try {
            if (ctx && ctx.reportProgress)
            {
                await ctx.reportProgress({
                    progress: attempts,
                    total: max_attempts,
                    message: `Polling for data (attempt `
                        +`${attempts + 1}/${max_attempts})`,
                });
            }
            let snapshot_response = await axios({
                url: `https://api.brightdata.com/datasets/v3`
                    +`/snapshot/${snapshot_id}`,
                params: {format: 'json'},
                method: 'GET',
                headers: api_headers(),
            });
            if (['running', 'building'].includes(snapshot_response.data?.status))
            {
                console.error(`[web_data_${id}] snapshot not ready, `
                    +`polling again (attempt `
                    +`${attempts + 1}/${max_attempts})`);
                attempts++;
                await new Promise(resolve=>setTimeout(resolve, 1000));
                continue;
            }
            console.error(`[web_data_${id}] snapshot data received `
                +`after ${attempts + 1} attempts`);
            let result_data = JSON.stringify(snapshot_response.data);
            return result_data;
        } catch(e){
            console.error(`[web_data_${id}] polling error: `
                +`${e.message}`);
            attempts++;
            await new Promise(resolve=>setTimeout(resolve, 1000));
        }
    }
    throw new Error(`Timeout after ${max_attempts} seconds waiting `
        +`for data`);
}),

server.js:752-778 (helper)

Helper wrapper tool_fn used in execute, provides rate limiting (check_rate_limit), stats tracking, execution logging with tool name 'web_data_amazon_product_search', error handling/logging, timing.

function tool_fn(name, fn){
    return async(data, ctx)=>{
        check_rate_limit();
        debug_stats.tool_calls[name] = debug_stats.tool_calls[name]||0;
        debug_stats.tool_calls[name]++;
        debug_stats.session_calls++;
        let ts = Date.now();
        console.error(`[%s] executing %s`, name, JSON.stringify(data));
        try { return await fn(data, ctx); }
        catch(e){
            if (e.response)
            {
                console.error(`[%s] error %s %s: %s`, name, e.response.status,
                    e.response.statusText, e.response.data);
                let message = e.response.data;
                if (message?.length)
                    throw new Error(`HTTP ${e.response.status}: ${message}`);
            }
            else
                console.error(`[%s] error %s`, name, e.stack);
            throw e;
        } finally {
            let dur = Date.now()-ts;
            console.error(`[%s] tool finished in %sms`, name, dur);
        }
    };
}

Tool Definition Quality

B3.1/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden. It discloses that this is a read operation ('read structured... data'), mentions reliability aspects ('can be more reliable than scraping'), and hints at caching behavior ('can be a cache lookup'). However, it lacks details on rate limits, authentication needs, error conditions, or what 'structured data' entails in the output. This provides basic context but leaves significant behavioral gaps.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is brief and front-loaded with the core purpose. All three sentences add value: the first states what it does, the second lists requirements, and the third provides behavioral context. There's no wasted text, though it could be slightly more structured for clarity.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no annotations, 0% schema coverage, no output schema, and 3 parameters, the description is incomplete. It covers the basic purpose and hints at caching behavior but misses critical details: parameter semantics (especially 'pages_to_search'), output format, error handling, and explicit differentiation from siblings. For a tool with this complexity, more context is needed.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, so the description must compensate. It mentions 'keyword' and 'url' parameters in the requirement statement, but doesn't explain 'pages_to_search' at all. The description adds minimal semantic value beyond naming two parameters, failing to clarify format expectations (e.g., URL must be an Amazon domain) or the purpose of 'pages_to_search'. This is inadequate given the coverage gap.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Quickly read structured amazon product search data.' It specifies the verb ('read'), resource ('amazon product search data'), and scope ('structured'). However, it doesn't explicitly differentiate from sibling tools like 'web_data_amazon_product' or 'search_engine', which would be needed for a score of 5.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides some usage context by stating 'Requires a valid search keyword and amazon domain URL' and mentioning it 'can be a cache lookup, so it can be more reliable than scraping.' This implies when to use it (for Amazon product searches with caching benefits) but doesn't explicitly compare to alternatives like 'scrape_as_html' or 'search_engine' from the sibling list, nor does it specify when not to use it.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

Lightport: Open-Sourcing Glama's AI Gateway
By punkpeye on April 27, 2026.
open source
OpenAI
Tool Definition Quality Score (TDQS)
By punkpeye on April 3, 2026.
mcp
The Hackers Who Tracked My Sleep Cycle
By punkpeye on March 26, 2026.
security

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/dsouza-anush/brightdata-mcp-heroku'

If you have feedback or need assistance with the MCP directory API, please join our Discord server