scrape_url_markdown
Extract website content as Markdown by bypassing bot detection, captchas, and geolocation restrictions using the ScrAPI service.
Instructions
Use a URL to scrape a website using the ScrAPI service and retrieve the result as Markdown. Use this for scraping website content that is difficult to access because of bot detection, captchas or even geolocation restrictions. The result will be in Markdown which is preferable if the text content of the webpage is important and not the structural information of the page.
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| url | Yes | The URL to scrape |
Implementation Reference
- index.ts:71-87 (registration)Registration of the scrape_url_markdown tool, specifying its title, description, input schema (URL), and handler that calls the shared scrapeUrl helper.server.registerTool( "scrape_url_markdown", { title: "Scrape URL and respond with Markdown", description: "Use a URL to scrape a website using the ScrAPI service and retrieve the result as Markdown. " + "Use this for scraping website content that is difficult to access because of bot detection, captchas or even geolocation restrictions. " + "The result will be in Markdown which is preferable if the text content of the webpage is important and not the structural information of the page.", inputSchema: { url: z .string() .url({ message: "Invalid URL" }) .describe("The URL to scrape"), }, }, async ({ url }) => await scrapeUrl(url, "Markdown") );
- index.ts:89-167 (handler)Shared handler function that performs the actual URL scraping using ScrAPI, supporting both HTML and Markdown formats. Called by the tool's registered handler.async function scrapeUrl( url: string, format: "HTML" | "Markdown" ): Promise<CallToolResult> { var body = { url: url, useBrowser: true, solveCaptchas: true, acceptDialogs: true, proxyType: "Residential", responseFormat: format, }; try { const response = await fetch("https://api.scrapi.tech/v1/scrape", { method: "POST", headers: { "User-Agent": `${SCRAPI_SERVER_NAME} - ${SCRAPI_SERVER_VERSION}`, "Content-Type": "application/json", "X-API-KEY": config.scrapiApiKey || SCRAPI_API_KEY, }, body: JSON.stringify(body), signal: AbortSignal.timeout(30000), }); const data = await response.text(); if (response.ok) { return { content: [ { type: "text" as const, text: data, _meta: { mimeType: `text/${format.toLowerCase()}`, }, }, ], }; } return { content: [ { type: "text" as const, text: data, }, ], isError: true, }; } catch (error) { console.error("Error calling API:", error); } const response = await fetch("https://api.scrapi.tech/v1/scrape", { method: "POST", headers: { "User-Agent": `${SCRAPI_SERVER_NAME} - ${SCRAPI_SERVER_VERSION}`, "Content-Type": "application/json", "X-API-KEY": SCRAPI_API_KEY, }, body: JSON.stringify(body), signal: AbortSignal.timeout(30000), }); const data = await response.text(); return { content: [ { type: "text", text: data, _meta: { mimeType: `text/${format.toLowerCase()}`, }, }, ], }; }
- index.ts:79-84 (schema)Input schema for the scrape_url_markdown tool, validating the 'url' parameter.inputSchema: { url: z .string() .url({ message: "Invalid URL" }) .describe("The URL to scrape"), },