crawl
Recursively crawl websites to extract content in markdown, text, or raw formats, with configurable depth and page limits for data collection.
Instructions
Recursively crawl websites and extract content.
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| url | Yes | Starting URL | |
| limit | No | Maximum number of pages to crawl | |
| depth | No | Maximum depth | |
| format | No | Output format | markdown |
Implementation Reference
- src/index.ts:392-412 (handler)Handler function that proxies the crawl request to the DumplingAI API endpoint /api/v1/crawl, handling authentication and returning the response as text content.async ({ url, limit, depth, format }) => { const apiKey = process.env.DUMPLING_API_KEY; if (!apiKey) throw new Error("DUMPLING_API_KEY not set"); const response = await fetch(`${NWS_API_BASE}/api/v1/crawl`, { method: "POST", headers: { "Content-Type": "application/json", Authorization: `Bearer ${apiKey}`, }, body: JSON.stringify({ url, limit, depth, format, }), }); if (!response.ok) throw new Error(`Failed: ${response.status} ${await response.text()}`); const data = await response.json(); return { content: [{ type: "text", text: JSON.stringify(data, null, 2) }] }; }
- src/index.ts:378-391 (schema)Input schema using Zod for validating parameters: url (required URL), limit (optional max pages, default 5), depth (optional max depth, default 2), format (optional enum, default markdown).{ url: z.string().url().describe("Starting URL"), limit: z .number() .optional() .default(5) .describe("Maximum number of pages to crawl"), depth: z.number().optional().default(2).describe("Maximum depth"), format: z .enum(["markdown", "text", "raw"]) .optional() .default("markdown") .describe("Output format"), },
- src/index.ts:375-413 (registration)MCP tool registration using server.tool() with name 'crawl', description, input schema, and handler function.server.tool( "crawl", "Recursively crawl websites and extract content.", { url: z.string().url().describe("Starting URL"), limit: z .number() .optional() .default(5) .describe("Maximum number of pages to crawl"), depth: z.number().optional().default(2).describe("Maximum depth"), format: z .enum(["markdown", "text", "raw"]) .optional() .default("markdown") .describe("Output format"), }, async ({ url, limit, depth, format }) => { const apiKey = process.env.DUMPLING_API_KEY; if (!apiKey) throw new Error("DUMPLING_API_KEY not set"); const response = await fetch(`${NWS_API_BASE}/api/v1/crawl`, { method: "POST", headers: { "Content-Type": "application/json", Authorization: `Bearer ${apiKey}`, }, body: JSON.stringify({ url, limit, depth, format, }), }); if (!response.ok) throw new Error(`Failed: ${response.status} ${await response.text()}`); const data = await response.json(); return { content: [{ type: "text", text: JSON.stringify(data, null, 2) }] }; } );