extract-structured-data
Extract structured data from any website by specifying a URL and prompt, enabling precise retrieval of targeted information for analysis or integration.
Instructions
Extract structured data from a website.
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| prompt | Yes | Prompt to extract data from the website | |
| url | Yes | URL of the website to extract data from |
Implementation Reference
- src/index.ts:108-130 (handler)The tool handler function that invokes the Scrapezy API via callScrapezyApi and formats the result or error as MCP text content.async ({ url, prompt }) => { const result = await callScrapezyApi(url, prompt); if ("error" in result) { return { content: [ { type: "text", text: `Failed to extract data from ${url}: ${result.error}`, }, ], }; } return { content: [ { type: "text", text: JSON.stringify(result, null, 2), }, ], }; }
- src/index.ts:105-107 (schema)Zod input schema defining 'url' (string URL) and 'prompt' (string) parameters for the tool.url: z.string().url().describe("URL of the website to extract data from"), prompt: z.string().describe("Prompt to extract data from the website"), },
- src/index.ts:102-131 (registration)Registration of the 'extract-structured-data' tool using server.tool, including name, description, input schema, and handler."extract-structured-data", "Extract structured data from a website.", { url: z.string().url().describe("URL of the website to extract data from"), prompt: z.string().describe("Prompt to extract data from the website"), }, async ({ url, prompt }) => { const result = await callScrapezyApi(url, prompt); if ("error" in result) { return { content: [ { type: "text", text: `Failed to extract data from ${url}: ${result.error}`, }, ], }; } return { content: [ { type: "text", text: JSON.stringify(result, null, 2), }, ], }; } );
- src/index.ts:46-99 (helper)Helper function that submits a structured data extraction job to the Scrapezy API and polls for the result until completion or timeout.async function callScrapezyApi(url: string, prompt: string) { const apiKey = getScrapezyApiKey(); // Step 1: Submit the extraction job const submitResponse = await fetch(`${SCRAPEZY_API}/extract`, { method: "POST", headers: { "Content-Type": "application/json", "x-api-key": apiKey, }, body: JSON.stringify({ url, prompt }), }); const jobData = await submitResponse.json(); if (!jobData.jobId) { return { error: "Failed to submit extraction job" }; } // Step 2: Poll for results const maxAttempts = 30; // Maximum number of polling attempts const pollingInterval = 2000; // 2 seconds between polling attempts let attempts = 0; while (attempts < maxAttempts) { attempts++; // Wait for the polling interval await new Promise(resolve => setTimeout(resolve, pollingInterval)); // Poll for job status const pollResponse = await fetch(`${SCRAPEZY_API}/extract/${jobData.jobId}`, { method: "GET", headers: { "Content-Type": "application/json", "x-api-key": apiKey, }, }); const pollData = await pollResponse.json(); // If the job is completed or failed, return the results if (pollData.status !== "pending") { return pollData.result || { error: pollData.error || "Unknown error" }; } // If we've reached the maximum attempts, return a timeout error if (attempts >= maxAttempts) { return { error: "Extraction job timed out" }; } } return { error: "Extraction job timed out" }; }