scrape_url_html
Extract website HTML content by scraping URLs that are blocked by bot detection, captchas, or geolocation restrictions for advanced parsing needs.
Instructions
Use a URL to scrape a website using the ScrAPI service and retrieve the result as HTML. Use this for scraping website content that is difficult to access because of bot detection, captchas or even geolocation restrictions. The result will be in HTML which is preferable if advanced parsing is required.
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| url | Yes | The URL to scrape |
Implementation Reference
- index.ts:89-167 (handler)Core handler function that performs the actual scraping by calling the ScrAPI service with the provided URL and format (HTML for this tool), handles API key config, errors, and returns the HTML content as text with mime type.async function scrapeUrl( url: string, format: "HTML" | "Markdown" ): Promise<CallToolResult> { var body = { url: url, useBrowser: true, solveCaptchas: true, acceptDialogs: true, proxyType: "Residential", responseFormat: format, }; try { const response = await fetch("https://api.scrapi.tech/v1/scrape", { method: "POST", headers: { "User-Agent": `${SCRAPI_SERVER_NAME} - ${SCRAPI_SERVER_VERSION}`, "Content-Type": "application/json", "X-API-KEY": config.scrapiApiKey || SCRAPI_API_KEY, }, body: JSON.stringify(body), signal: AbortSignal.timeout(30000), }); const data = await response.text(); if (response.ok) { return { content: [ { type: "text" as const, text: data, _meta: { mimeType: `text/${format.toLowerCase()}`, }, }, ], }; } return { content: [ { type: "text" as const, text: data, }, ], isError: true, }; } catch (error) { console.error("Error calling API:", error); } const response = await fetch("https://api.scrapi.tech/v1/scrape", { method: "POST", headers: { "User-Agent": `${SCRAPI_SERVER_NAME} - ${SCRAPI_SERVER_VERSION}`, "Content-Type": "application/json", "X-API-KEY": SCRAPI_API_KEY, }, body: JSON.stringify(body), signal: AbortSignal.timeout(30000), }); const data = await response.text(); return { content: [ { type: "text", text: data, _meta: { mimeType: `text/${format.toLowerCase()}`, }, }, ], }; }
- index.ts:61-66 (schema)Input schema defining the 'url' parameter as a valid URL string.inputSchema: { url: z .string() .url({ message: "Invalid URL" }) .describe("The URL to scrape"), },
- index.ts:53-69 (registration)Tool registration call that defines the name, metadata, input schema, and delegates to the scrapeUrl handler with HTML format.server.registerTool( "scrape_url_html", { title: "Scrape URL and respond with HTML", description: "Use a URL to scrape a website using the ScrAPI service and retrieve the result as HTML. " + "Use this for scraping website content that is difficult to access because of bot detection, captchas or even geolocation restrictions. " + "The result will be in HTML which is preferable if advanced parsing is required.", inputSchema: { url: z .string() .url({ message: "Invalid URL" }) .describe("The URL to scrape"), }, }, async ({ url }) => await scrapeUrl(url, "HTML") );