crawl
Recursively crawl websites to extract content, specifying URL, depth, and output format for efficient data collection and processing.
Instructions
Recursively crawl websites and extract content.
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| depth | No | Maximum depth | |
| format | No | Output format | markdown |
| limit | No | Maximum number of pages to crawl | |
| url | Yes | Starting URL |
Implementation Reference
- src/index.ts:392-413 (handler)The handler function that implements the 'crawl' tool logic by making a POST request to the external Dumpling AI crawl API endpoint with the provided parameters.async ({ url, limit, depth, format }) => { const apiKey = process.env.DUMPLING_API_KEY; if (!apiKey) throw new Error("DUMPLING_API_KEY not set"); const response = await fetch(`${NWS_API_BASE}/api/v1/crawl`, { method: "POST", headers: { "Content-Type": "application/json", Authorization: `Bearer ${apiKey}`, }, body: JSON.stringify({ url, limit, depth, format, }), }); if (!response.ok) throw new Error(`Failed: ${response.status} ${await response.text()}`); const data = await response.json(); return { content: [{ type: "text", text: JSON.stringify(data, null, 2) }] }; } );
- src/index.ts:378-391 (schema)The Zod input schema defining parameters for the 'crawl' tool: starting URL, limit, depth, and output format.{ url: z.string().url().describe("Starting URL"), limit: z .number() .optional() .default(5) .describe("Maximum number of pages to crawl"), depth: z.number().optional().default(2).describe("Maximum depth"), format: z .enum(["markdown", "text", "raw"]) .optional() .default("markdown") .describe("Output format"), },
- src/index.ts:375-413 (registration)The server.tool registration call that defines and registers the 'crawl' tool with name, description, input schema, and handler function.server.tool( "crawl", "Recursively crawl websites and extract content.", { url: z.string().url().describe("Starting URL"), limit: z .number() .optional() .default(5) .describe("Maximum number of pages to crawl"), depth: z.number().optional().default(2).describe("Maximum depth"), format: z .enum(["markdown", "text", "raw"]) .optional() .default("markdown") .describe("Output format"), }, async ({ url, limit, depth, format }) => { const apiKey = process.env.DUMPLING_API_KEY; if (!apiKey) throw new Error("DUMPLING_API_KEY not set"); const response = await fetch(`${NWS_API_BASE}/api/v1/crawl`, { method: "POST", headers: { "Content-Type": "application/json", Authorization: `Bearer ${apiKey}`, }, body: JSON.stringify({ url, limit, depth, format, }), }); if (!response.ok) throw new Error(`Failed: ${response.status} ${await response.text()}`); const data = await response.json(); return { content: [{ type: "text", text: JSON.stringify(data, null, 2) }] }; } );