Skip to main content
Glama
Scrapezy

Scrapezy

Official
by Scrapezy

extract-structured-data

Extract structured data from any website by specifying a URL and prompt, enabling precise retrieval of targeted information for analysis or integration.

Instructions

Extract structured data from a website.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
promptYesPrompt to extract data from the website
urlYesURL of the website to extract data from

Implementation Reference

  • The tool handler function that invokes the Scrapezy API via callScrapezyApi and formats the result or error as MCP text content.
    async ({ url, prompt }) => { const result = await callScrapezyApi(url, prompt); if ("error" in result) { return { content: [ { type: "text", text: `Failed to extract data from ${url}: ${result.error}`, }, ], }; } return { content: [ { type: "text", text: JSON.stringify(result, null, 2), }, ], }; }
  • Zod input schema defining 'url' (string URL) and 'prompt' (string) parameters for the tool.
    url: z.string().url().describe("URL of the website to extract data from"), prompt: z.string().describe("Prompt to extract data from the website"), },
  • src/index.ts:102-131 (registration)
    Registration of the 'extract-structured-data' tool using server.tool, including name, description, input schema, and handler.
    "extract-structured-data", "Extract structured data from a website.", { url: z.string().url().describe("URL of the website to extract data from"), prompt: z.string().describe("Prompt to extract data from the website"), }, async ({ url, prompt }) => { const result = await callScrapezyApi(url, prompt); if ("error" in result) { return { content: [ { type: "text", text: `Failed to extract data from ${url}: ${result.error}`, }, ], }; } return { content: [ { type: "text", text: JSON.stringify(result, null, 2), }, ], }; } );
  • Helper function that submits a structured data extraction job to the Scrapezy API and polls for the result until completion or timeout.
    async function callScrapezyApi(url: string, prompt: string) { const apiKey = getScrapezyApiKey(); // Step 1: Submit the extraction job const submitResponse = await fetch(`${SCRAPEZY_API}/extract`, { method: "POST", headers: { "Content-Type": "application/json", "x-api-key": apiKey, }, body: JSON.stringify({ url, prompt }), }); const jobData = await submitResponse.json(); if (!jobData.jobId) { return { error: "Failed to submit extraction job" }; } // Step 2: Poll for results const maxAttempts = 30; // Maximum number of polling attempts const pollingInterval = 2000; // 2 seconds between polling attempts let attempts = 0; while (attempts < maxAttempts) { attempts++; // Wait for the polling interval await new Promise(resolve => setTimeout(resolve, pollingInterval)); // Poll for job status const pollResponse = await fetch(`${SCRAPEZY_API}/extract/${jobData.jobId}`, { method: "GET", headers: { "Content-Type": "application/json", "x-api-key": apiKey, }, }); const pollData = await pollResponse.json(); // If the job is completed or failed, return the results if (pollData.status !== "pending") { return pollData.result || { error: pollData.error || "Unknown error" }; } // If we've reached the maximum attempts, return a timeout error if (attempts >= maxAttempts) { return { error: "Extraction job timed out" }; } } return { error: "Extraction job timed out" }; }

Other Tools

Related Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/Scrapezy/mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server