Skip to main content
Glama

web_data_crunchbase_company

Extract structured company data from Crunchbase URLs to analyze business information without web scraping.

Instructions

Quickly read structured crunchbase company data This can be a cache lookup, so it can be more reliable than scraping

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
urlYes

Implementation Reference

  • The core handler logic for the web_data_crunchbase_company tool. It triggers a BrightData dataset collection using the specific dataset_id and polls the snapshot status until data is ready or times out after 600 attempts.
    execute: tool_fn(`web_data_${id}`, async(data, ctx)=>{
        let trigger_response = await axios({
            url: 'https://api.brightdata.com/datasets/v3/trigger',
            params: {dataset_id, include_errors: true},
            method: 'POST',
            data: [data],
            headers: api_headers(),
        });
        if (!trigger_response.data?.snapshot_id)
            throw new Error('No snapshot ID returned from request');
        let snapshot_id = trigger_response.data.snapshot_id;
        console.error(`[web_data_${id}] triggered collection with `
            +`snapshot ID: ${snapshot_id}`);
        let max_attempts = 600;
        let attempts = 0;
        while (attempts < max_attempts)
        {
            try {
                if (ctx && ctx.reportProgress)
                {
                    await ctx.reportProgress({
                        progress: attempts,
                        total: max_attempts,
                        message: `Polling for data (attempt `
                            +`${attempts + 1}/${max_attempts})`,
                    });
                }
                let snapshot_response = await axios({
                    url: `https://api.brightdata.com/datasets/v3`
                        +`/snapshot/${snapshot_id}`,
                    params: {format: 'json'},
                    method: 'GET',
                    headers: api_headers(),
                });
                if (['running', 'building'].includes(snapshot_response.data?.status))
                {
                    console.error(`[web_data_${id}] snapshot not ready, `
                        +`polling again (attempt `
                        +`${attempts + 1}/${max_attempts})`);
                    attempts++;
                    await new Promise(resolve=>setTimeout(resolve, 1000));
                    continue;
                }
                console.error(`[web_data_${id}] snapshot data received `
                    +`after ${attempts + 1} attempts`);
                let result_data = JSON.stringify(snapshot_response.data);
                return result_data;
            } catch(e){
                console.error(`[web_data_${id}] polling error: `
                    +`${e.message}`);
                attempts++;
                await new Promise(resolve=>setTimeout(resolve, 1000));
            }
        }
        throw new Error(`Timeout after ${max_attempts} seconds waiting `
            +`for data`);
    }),
  • Dynamic schema construction for the tool parameters based on the inputs array from the dataset config. For crunchbase_company, inputs=['url'], so parameter 'url' is z.string().url().
    let parameters = {};
    for (let input of inputs)
    {
        let param_schema = input=='url' ? z.string().url() : z.string();
        parameters[input] = defaults[input] !== undefined ?
            param_schema.default(defaults[input]) : param_schema;
    }
  • server.js:409-416 (registration)
    Registration of the dataset configuration for crunchbase_company, which is used in the loop to create the 'web_data_crunchbase_company' tool with name `web_data_${id}`.
        id: 'crunchbase_company',
        dataset_id: 'gd_l1vijqt9jfj7olije',
        description: [
            'Quickly read structured crunchbase company data',
            'This can be a cache lookup, so it can be more reliable than scraping',
        ].join('\n'),
        inputs: ['url'],
    },
  • server.js:683-686 (registration)
    The addTool call that registers the web_data_crunchbase_company tool using the dataset config.
    addTool({
        name: `web_data_${id}`,
        description,
        parameters: z.object(parameters),

Tool Definition Quality

Score is being calculated. Check back soon.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/dsouza-anush/brightdata-mcp-heroku'

If you have feedback or need assistance with the MCP directory API, please join our Discord server