scrape
Extract web content as Markdown from any URL. Automatically escalates from static HTML to JavaScript rendering to stealth browser for challenging pages.
Instructions
Extract content from any URL as Markdown. Auto-escalates from fast HTTP to JS rendering to stealth browser. Cost: fast 1 credit, standard 2 credits, thorough 5 credits.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| url | Yes | URL to scrape | |
| mode | No | fast (1 credit, static HTML), standard (2 credits, JS rendering), thorough (5 credits, stealth + CAPTCHA) | standard |
| output | No | markdown, html, or raw_html | markdown |
Implementation Reference
- src/index.ts:46-54 (schema)Defines the 'scrape' tool's name, description, and input schema (url, mode, output) in the CAPABILITIES array.
{ name: "scrape", description: "Extract content from any URL as Markdown. Auto-escalates from fast HTTP to JS rendering to stealth browser. Cost: fast 1 credit, standard 2 credits, thorough 5 credits.", inputSchema: { url: z.string().describe("URL to scrape"), mode: z.string().optional().default("standard").describe("fast (1 credit, static HTML), standard (2 credits, JS rendering), thorough (5 credits, stealth + CAPTCHA)"), output: z.string().optional().default("markdown").describe("markdown, html, or raw_html"), }, }, - src/index.ts:247-259 (registration)Dynamically registers all capabilities (including 'scrape') as MCP tools via server.registerTool() inside createServer().
for (const cap of CAPABILITIES) { // Cast inputSchema to avoid TS2589 (excessively deep type instantiation from Zod chains) server.registerTool( cap.name, { description: cap.description, inputSchema: cap.inputSchema as any, }, async (args: any): Promise<CallToolResult> => { return callSuprsonic(cap.name, args as Record<string, unknown>); }, ); } - src/index.ts:255-257 (handler)The actual handler for the 'scrape' tool — delegates to callSuprsonic(cap.name, args) which sends a POST to the Suprsonic REST API.
async (args: any): Promise<CallToolResult> => { return callSuprsonic(cap.name, args as Record<string, unknown>); }, - src/index.ts:183-234 (helper)The callSuprsonic helper function that makes HTTP requests to the Suprsonic API for all capabilities including 'scrape'.
async function callSuprsonic(capability: string, params: Record<string, unknown>): Promise<CallToolResult> { if (!API_KEY) { return { content: [{ type: "text", text: "Error: SUPRSONIC_API_KEY environment variable is not set. Get your key at https://suprsonic.ai/app/apis" }], isError: true, }; } try { const resp = await fetch(`${BASE_URL}/v1/agent`, { method: "POST", headers: { "Authorization": `Bearer ${API_KEY}`, "Content-Type": "application/json", }, body: JSON.stringify({ capability, params }), }); const result = await resp.json() as any; // Handle non-envelope responses (401, 429, etc. return {"detail": ...}) if (result.detail && result.success === undefined) { const msg = typeof result.detail === "object" ? (result.detail.title || result.detail.detail || JSON.stringify(result.detail)) : String(result.detail); return { content: [{ type: "text", text: `Error (HTTP ${resp.status}): ${msg}` }], isError: true, }; } if (!result.success) { const errMsg = result.error?.detail || result.error?.title || "Request failed"; return { content: [{ type: "text", text: `Error: ${errMsg}` }], isError: true, }; } const text = JSON.stringify(result.data, null, 2); const meta = result.metadata ? `\n\n[Provider: ${(result.metadata as any).provider_used || "unknown"}, ${(result.metadata as any).response_time_ms || 0}ms, ${result.credits_used || 0} credits]` : ""; return { content: [{ type: "text", text: text + meta }], }; } catch (err) { return { content: [{ type: "text", text: `Network error: ${err instanceof Error ? err.message : String(err)}` }], isError: true, }; } }