scrape
Extract and parse web content in markdown, HTML, or screenshot formats from any URL, with options to clean content and render JavaScript.
Instructions
Extract and parse content from any web page.
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| url | Yes | URL to scrape | |
| format | No | Output format | |
| cleaned | No | Whether to clean content | |
| renderJs | No | Whether to render JavaScript |
Implementation Reference
- src/index.ts:356-371 (handler)The handler function for the 'scrape' tool. It sends a POST request to the external Dumpling AI API endpoint '/api/v1/scrape' with the URL and options, then returns the JSON response formatted as MCP content.async ({ url, format, cleaned, renderJs }) => { const apiKey = process.env.DUMPLING_API_KEY; if (!apiKey) throw new Error("DUMPLING_API_KEY not set"); const response = await fetch(`${NWS_API_BASE}/api/v1/scrape`, { method: "POST", headers: { "Content-Type": "application/json", Authorization: `Bearer ${apiKey}`, }, body: JSON.stringify({ url, format, cleaned, renderJs }), }); if (!response.ok) throw new Error(`Failed: ${response.status} ${await response.text()}`); const data = await response.json(); return { content: [{ type: "text", text: JSON.stringify(data, null, 2) }] }; }
- src/index.ts:343-355 (schema)Input schema for the 'scrape' tool defined using Zod, validating the URL and optional scraping options like format, cleaned content, and JS rendering.{ url: z.string().url().describe("URL to scrape"), format: z .enum(["markdown", "html", "screenshot"]) .optional() .describe("Output format"), cleaned: z.boolean().optional().describe("Whether to clean content"), renderJs: z .boolean() .optional() .default(true) .describe("Whether to render JavaScript"), },
- src/index.ts:340-372 (registration)Full registration of the 'scrape' tool on the MCP server, specifying name, description, input schema, and handler function.server.tool( "scrape", "Extract and parse content from any web page.", { url: z.string().url().describe("URL to scrape"), format: z .enum(["markdown", "html", "screenshot"]) .optional() .describe("Output format"), cleaned: z.boolean().optional().describe("Whether to clean content"), renderJs: z .boolean() .optional() .default(true) .describe("Whether to render JavaScript"), }, async ({ url, format, cleaned, renderJs }) => { const apiKey = process.env.DUMPLING_API_KEY; if (!apiKey) throw new Error("DUMPLING_API_KEY not set"); const response = await fetch(`${NWS_API_BASE}/api/v1/scrape`, { method: "POST", headers: { "Content-Type": "application/json", Authorization: `Bearer ${apiKey}`, }, body: JSON.stringify({ url, format, cleaned, renderJs }), }); if (!response.ok) throw new Error(`Failed: ${response.status} ${await response.text()}`); const data = await response.json(); return { content: [{ type: "text", text: JSON.stringify(data, null, 2) }] }; } );