x402_crawl_site
Crawl websites via BFS to extract markdown, links, tables, images, and metadata from multiple pages. Configure depth, page limits, and path filters for structured data collection.
Instructions
Crawl a website via BFS and return per-page extraction results (markdown, links, tables, images, metadata). Price: $0.10 USDC per crawl (paid mode) | Free test: returns fixture data.
Crawls up to max_pages pages starting from the seed URL, up to max_depth link hops deep. Same extraction pipeline as x402_scrape_url — each page returns markdown, links, tables, images, metadata. Optional include_paths/exclude_paths glob filters (e.g. '/blog/*') restrict which URLs are followed. Hard limits: max 15 pages, max depth 5. Response includes pages_requested, pages_crawled, pages_skipped. Without X402_PRIVATE_KEY, only the free test endpoint is available.
Returns: seed_url, pages_requested, pages_crawled, pages_skipped, reasons_skipped, results array.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| url | Yes | Seed URL to begin crawling (http/https, max 2048 chars) | |
| max_pages | No | Maximum pages to crawl (1-15, default: 10) | |
| max_depth | No | Maximum link depth from seed URL (1-5, default: 2) | |
| include_paths | No | Only follow URLs matching these path glob patterns (e.g. '/blog/*', max 20) | |
| exclude_paths | No | Skip URLs matching these path glob patterns (e.g. '/admin/*', max 20) |
Implementation Reference
- src/index.ts:766-786 (handler)The handler implementation for x402_crawl_site, which delegates to either the scraping API's paid endpoint or a free test endpoint.
async (params) => { const base = APIS.scraping.baseUrl; try { const usePaid = !!PRIVATE_KEY; if (usePaid) { const payload: Record<string, unknown> = { url: params.url, max_pages: params.max_pages, max_depth: params.max_depth, }; if (params.include_paths) payload.include_paths = params.include_paths; if (params.exclude_paths) payload.exclude_paths = params.exclude_paths; const data = await apiPost(base, "/crawl", payload, true); return textResult({ mode: "paid", cost: "$0.10", ...data }); } else { const data = await apiGet(base, "/crawl/test"); return textResult({ mode: "free_test", note: "Free test — returns fixture data. Set X402_PRIVATE_KEY for live crawling.", ...data, }); - src/index.ts:754-765 (schema)Input validation schema for x402_crawl_site.
{ url: z.string().url() .describe("Seed URL to begin crawling (http/https, max 2048 chars)"), max_pages: z.number().int().min(1).max(15).default(10) .describe("Maximum pages to crawl (1-15, default: 10)"), max_depth: z.number().int().min(1).max(5).default(2) .describe("Maximum link depth from seed URL (1-5, default: 2)"), include_paths: z.array(z.string()).max(20).optional() .describe("Only follow URLs matching these path glob patterns (e.g. '/blog/*', max 20)"), exclude_paths: z.array(z.string()).max(20).optional() .describe("Skip URLs matching these path glob patterns (e.g. '/admin/*', max 20)"), }, - src/index.ts:742-753 (registration)Registration of the x402_crawl_site tool in the MCP server.
server.tool( "x402_crawl_site", `Crawl a website via BFS and return per-page extraction results (markdown, links, tables, images, metadata). Price: $0.10 USDC per crawl (paid mode) | Free test: returns fixture data. Crawls up to max_pages pages starting from the seed URL, up to max_depth link hops deep. Same extraction pipeline as x402_scrape_url — each page returns markdown, links, tables, images, metadata. Optional include_paths/exclude_paths glob filters (e.g. '/blog/*') restrict which URLs are followed. Hard limits: max 15 pages, max depth 5. Response includes pages_requested, pages_crawled, pages_skipped. Without X402_PRIVATE_KEY, only the free test endpoint is available. Returns: seed_url, pages_requested, pages_crawled, pages_skipped, reasons_skipped, results array.`,