crawl

crawl

Recursively crawl websites to extract content in markdown, text, or raw formats, with configurable depth and page limits for data collection.

Instructions

Recursively crawl websites and extract content.

Input Schema

TableJSON Schema

Name	Required	Description	Default
`url`	Yes	Starting URL
`limit`	No	Maximum number of pages to crawl
`depth`	No	Maximum depth
`format`	No	Output format	markdown

Implementation Reference

src/index.ts:392-412 (handler)
Handler function that proxies the crawl request to the DumplingAI API endpoint /api/v1/crawl, handling authentication and returning the response as text content.
async ({ url, limit, depth, format }) => { const apiKey = process.env.DUMPLING_API_KEY; if (!apiKey) throw new Error("DUMPLING_API_KEY not set"); const response = await fetch(`${NWS_API_BASE}/api/v1/crawl`, { method: "POST", headers: { "Content-Type": "application/json", Authorization: `Bearer ${apiKey}`, }, body: JSON.stringify({ url, limit, depth, format, }), }); if (!response.ok) throw new Error(`Failed: ${response.status} ${await response.text()}`); const data = await response.json(); return { content: [{ type: "text", text: JSON.stringify(data, null, 2) }] }; }
src/index.ts:378-391 (schema)
Input schema using Zod for validating parameters: url (required URL), limit (optional max pages, default 5), depth (optional max depth, default 2), format (optional enum, default markdown).
{ url: z.string().url().describe("Starting URL"), limit: z .number() .optional() .default(5) .describe("Maximum number of pages to crawl"), depth: z.number().optional().default(2).describe("Maximum depth"), format: z .enum(["markdown", "text", "raw"]) .optional() .default("markdown") .describe("Output format"), },
src/index.ts:375-413 (registration)
MCP tool registration using server.tool() with name 'crawl', description, input schema, and handler function.
server.tool( "crawl", "Recursively crawl websites and extract content.", { url: z.string().url().describe("Starting URL"), limit: z .number() .optional() .default(5) .describe("Maximum number of pages to crawl"), depth: z.number().optional().default(2).describe("Maximum depth"), format: z .enum(["markdown", "text", "raw"]) .optional() .default("markdown") .describe("Output format"), }, async ({ url, limit, depth, format }) => { const apiKey = process.env.DUMPLING_API_KEY; if (!apiKey) throw new Error("DUMPLING_API_KEY not set"); const response = await fetch(`${NWS_API_BASE}/api/v1/crawl`, { method: "POST", headers: { "Content-Type": "application/json", Authorization: `Bearer ${apiKey}`, }, body: JSON.stringify({ url, limit, depth, format, }), }); if (!response.ok) throw new Error(`Failed: ${response.status} ${await response.text()}`); const data = await response.json(); return { content: [{ type: "text", text: JSON.stringify(data, null, 2) }] }; } );

Dumpling AI MCP Server

Instructions

Input Schema

Implementation Reference

Other Tools

Latest Blog Posts

MCP directory API