scrape_url_html
Scrape website HTML content by providing a URL, bypassing bot detection, captchas, and geolocation restrictions. Ideal for advanced parsing needs using the ScrAPI MCP Server.
Instructions
Use a URL to scrape a website using the ScrAPI service and retrieve the result as HTML. Use this for scraping website content that is difficult to access because of bot detection, captchas or even geolocation restrictions. The result will be in HTML which is preferable if advanced parsing is required.
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| url | Yes |
Implementation Reference
- index.ts:53-69 (registration)Registers the 'scrape_url_html' tool with MCP server, including title, description, Zod-based input schema requiring a URL, and an inline async handler that calls the scrapeUrl helper with 'HTML' format.server.registerTool( "scrape_url_html", { title: "Scrape URL and respond with HTML", description: "Use a URL to scrape a website using the ScrAPI service and retrieve the result as HTML. " + "Use this for scraping website content that is difficult to access because of bot detection, captchas or even geolocation restrictions. " + "The result will be in HTML which is preferable if advanced parsing is required.", inputSchema: { url: z .string() .url({ message: "Invalid URL" }) .describe("The URL to scrape"), }, }, async ({ url }) => await scrapeUrl(url, "HTML") );
- index.ts:61-66 (schema)Zod input schema for the scrape_url_html tool, validating a single 'url' parameter as a valid URL.inputSchema: { url: z .string() .url({ message: "Invalid URL" }) .describe("The URL to scrape"), },
- index.ts:89-163 (helper)Shared helper function that implements the core scraping logic for scrape_url_html (and scrape_url_markdown) by POSTing to ScrAPI endpoint with URL, browser settings, and format, handling config API key with fallback to default, returning HTML/Markdown content or error.async function scrapeUrl( url: string, format: "HTML" | "Markdown" ): Promise<CallToolResult> { var body = { url: url, useBrowser: true, solveCaptchas: true, acceptDialogs: true, proxyType: "Residential", responseFormat: format, }; try { const response = await fetch("https://api.scrapi.tech/v1/scrape", { method: "POST", headers: { "User-Agent": `${SCRAPI_SERVER_NAME} - ${SCRAPI_SERVER_VERSION}`, "Content-Type": "application/json", "X-API-KEY": config.scrapiApiKey || SCRAPI_API_KEY, }, body: JSON.stringify(body), signal: AbortSignal.timeout(30000), }); const data = await response.text(); if (response.ok) { return { content: [ { type: "text" as const, mimeType: `text/${format.toLowerCase()}`, text: data, }, ], }; } return { content: [ { type: "text" as const, text: data, }, ], isError: true, }; } catch (error) { console.error("Error calling API:", error); } const response = await fetch("https://api.scrapi.tech/v1/scrape", { method: "POST", headers: { "User-Agent": `${SCRAPI_SERVER_NAME} - ${SCRAPI_SERVER_VERSION}`, "Content-Type": "application/json", "X-API-KEY": SCRAPI_API_KEY, }, body: JSON.stringify(body), signal: AbortSignal.timeout(30000), }); const data = await response.text(); return { content: [ { type: "text", mimeType: `text/${format.toLowerCase()}`, text: data, }, ], }; }