scrape
Extract and parse content from any web page with options for markdown, HTML, or screenshot formats, while enabling JavaScript rendering and content cleaning.
Instructions
Extract and parse content from any web page.
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| cleaned | No | Whether to clean content | |
| format | No | Output format | |
| renderJs | No | Whether to render JavaScript | |
| url | Yes | URL to scrape |
Implementation Reference
- src/index.ts:340-372 (registration)Full registration of the 'scrape' tool using server.tool(), which defines the name, description, input schema, and execution handler that proxies to an external scraping API.server.tool( "scrape", "Extract and parse content from any web page.", { url: z.string().url().describe("URL to scrape"), format: z .enum(["markdown", "html", "screenshot"]) .optional() .describe("Output format"), cleaned: z.boolean().optional().describe("Whether to clean content"), renderJs: z .boolean() .optional() .default(true) .describe("Whether to render JavaScript"), }, async ({ url, format, cleaned, renderJs }) => { const apiKey = process.env.DUMPLING_API_KEY; if (!apiKey) throw new Error("DUMPLING_API_KEY not set"); const response = await fetch(`${NWS_API_BASE}/api/v1/scrape`, { method: "POST", headers: { "Content-Type": "application/json", Authorization: `Bearer ${apiKey}`, }, body: JSON.stringify({ url, format, cleaned, renderJs }), }); if (!response.ok) throw new Error(`Failed: ${response.status} ${await response.text()}`); const data = await response.json(); return { content: [{ type: "text", text: JSON.stringify(data, null, 2) }] }; } );
- src/index.ts:356-371 (handler)The handler function that performs the core logic: checks for API key, sends POST request to Dumpling API's /scrape endpoint with parameters, handles errors, parses JSON response, and returns it as MCP content.async ({ url, format, cleaned, renderJs }) => { const apiKey = process.env.DUMPLING_API_KEY; if (!apiKey) throw new Error("DUMPLING_API_KEY not set"); const response = await fetch(`${NWS_API_BASE}/api/v1/scrape`, { method: "POST", headers: { "Content-Type": "application/json", Authorization: `Bearer ${apiKey}`, }, body: JSON.stringify({ url, format, cleaned, renderJs }), }); if (!response.ok) throw new Error(`Failed: ${response.status} ${await response.text()}`); const data = await response.json(); return { content: [{ type: "text", text: JSON.stringify(data, null, 2) }] }; }
- src/index.ts:343-355 (schema)Zod schema defining input parameters for the scrape tool: required URL, optional format (markdown/html/screenshot), cleaned flag, and renderJs flag (default true).{ url: z.string().url().describe("URL to scrape"), format: z .enum(["markdown", "html", "screenshot"]) .optional() .describe("Output format"), cleaned: z.boolean().optional().describe("Whether to clean content"), renderJs: z .boolean() .optional() .default(true) .describe("Whether to render JavaScript"), },