Skip to main content
Glama

scrape_url_html

Extract website HTML content by scraping URLs that are blocked by bot detection, captchas, or geolocation restrictions for advanced parsing needs.

Instructions

Use a URL to scrape a website using the ScrAPI service and retrieve the result as HTML. Use this for scraping website content that is difficult to access because of bot detection, captchas or even geolocation restrictions. The result will be in HTML which is preferable if advanced parsing is required.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
urlYesThe URL to scrape

Implementation Reference

  • index.ts:89-167 (handler)
    Core handler function that performs the actual scraping by calling the ScrAPI service with the provided URL and format (HTML for this tool), handles API key config, errors, and returns the HTML content as text with mime type.
    async function scrapeUrl( url: string, format: "HTML" | "Markdown" ): Promise<CallToolResult> { var body = { url: url, useBrowser: true, solveCaptchas: true, acceptDialogs: true, proxyType: "Residential", responseFormat: format, }; try { const response = await fetch("https://api.scrapi.tech/v1/scrape", { method: "POST", headers: { "User-Agent": `${SCRAPI_SERVER_NAME} - ${SCRAPI_SERVER_VERSION}`, "Content-Type": "application/json", "X-API-KEY": config.scrapiApiKey || SCRAPI_API_KEY, }, body: JSON.stringify(body), signal: AbortSignal.timeout(30000), }); const data = await response.text(); if (response.ok) { return { content: [ { type: "text" as const, text: data, _meta: { mimeType: `text/${format.toLowerCase()}`, }, }, ], }; } return { content: [ { type: "text" as const, text: data, }, ], isError: true, }; } catch (error) { console.error("Error calling API:", error); } const response = await fetch("https://api.scrapi.tech/v1/scrape", { method: "POST", headers: { "User-Agent": `${SCRAPI_SERVER_NAME} - ${SCRAPI_SERVER_VERSION}`, "Content-Type": "application/json", "X-API-KEY": SCRAPI_API_KEY, }, body: JSON.stringify(body), signal: AbortSignal.timeout(30000), }); const data = await response.text(); return { content: [ { type: "text", text: data, _meta: { mimeType: `text/${format.toLowerCase()}`, }, }, ], }; }
  • Input schema defining the 'url' parameter as a valid URL string.
    inputSchema: { url: z .string() .url({ message: "Invalid URL" }) .describe("The URL to scrape"), },
  • index.ts:53-69 (registration)
    Tool registration call that defines the name, metadata, input schema, and delegates to the scrapeUrl handler with HTML format.
    server.registerTool( "scrape_url_html", { title: "Scrape URL and respond with HTML", description: "Use a URL to scrape a website using the ScrAPI service and retrieve the result as HTML. " + "Use this for scraping website content that is difficult to access because of bot detection, captchas or even geolocation restrictions. " + "The result will be in HTML which is preferable if advanced parsing is required.", inputSchema: { url: z .string() .url({ message: "Invalid URL" }) .describe("The URL to scrape"), }, }, async ({ url }) => await scrapeUrl(url, "HTML") );
Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/DevEnterpriseSoftware/scrapi-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server