Skip to main content
Glama

scrape_url_html

Scrape website HTML content by providing a URL, bypassing bot detection, captchas, and geolocation restrictions. Ideal for advanced parsing needs using the ScrAPI MCP Server.

Instructions

Use a URL to scrape a website using the ScrAPI service and retrieve the result as HTML. Use this for scraping website content that is difficult to access because of bot detection, captchas or even geolocation restrictions. The result will be in HTML which is preferable if advanced parsing is required.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
urlYes

Implementation Reference

  • index.ts:53-69 (registration)
    Registers the 'scrape_url_html' tool with MCP server, including title, description, Zod-based input schema requiring a URL, and an inline async handler that calls the scrapeUrl helper with 'HTML' format.
    server.registerTool( "scrape_url_html", { title: "Scrape URL and respond with HTML", description: "Use a URL to scrape a website using the ScrAPI service and retrieve the result as HTML. " + "Use this for scraping website content that is difficult to access because of bot detection, captchas or even geolocation restrictions. " + "The result will be in HTML which is preferable if advanced parsing is required.", inputSchema: { url: z .string() .url({ message: "Invalid URL" }) .describe("The URL to scrape"), }, }, async ({ url }) => await scrapeUrl(url, "HTML") );
  • Zod input schema for the scrape_url_html tool, validating a single 'url' parameter as a valid URL.
    inputSchema: { url: z .string() .url({ message: "Invalid URL" }) .describe("The URL to scrape"), },
  • Shared helper function that implements the core scraping logic for scrape_url_html (and scrape_url_markdown) by POSTing to ScrAPI endpoint with URL, browser settings, and format, handling config API key with fallback to default, returning HTML/Markdown content or error.
    async function scrapeUrl( url: string, format: "HTML" | "Markdown" ): Promise<CallToolResult> { var body = { url: url, useBrowser: true, solveCaptchas: true, acceptDialogs: true, proxyType: "Residential", responseFormat: format, }; try { const response = await fetch("https://api.scrapi.tech/v1/scrape", { method: "POST", headers: { "User-Agent": `${SCRAPI_SERVER_NAME} - ${SCRAPI_SERVER_VERSION}`, "Content-Type": "application/json", "X-API-KEY": config.scrapiApiKey || SCRAPI_API_KEY, }, body: JSON.stringify(body), signal: AbortSignal.timeout(30000), }); const data = await response.text(); if (response.ok) { return { content: [ { type: "text" as const, mimeType: `text/${format.toLowerCase()}`, text: data, }, ], }; } return { content: [ { type: "text" as const, text: data, }, ], isError: true, }; } catch (error) { console.error("Error calling API:", error); } const response = await fetch("https://api.scrapi.tech/v1/scrape", { method: "POST", headers: { "User-Agent": `${SCRAPI_SERVER_NAME} - ${SCRAPI_SERVER_VERSION}`, "Content-Type": "application/json", "X-API-KEY": SCRAPI_API_KEY, }, body: JSON.stringify(body), signal: AbortSignal.timeout(30000), }); const data = await response.text(); return { content: [ { type: "text", mimeType: `text/${format.toLowerCase()}`, text: data, }, ], }; }

Other Tools

Related Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/DevEnterpriseSoftware/scrapi-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server