scraper_crawl_url
Crawl multiple web pages from a starting URL using BFS link discovery and return compressed markdown with 70-90% fewer tokens than raw HTML.
Instructions
Crawl multiple pages from a starting URL using BFS link discovery. Returns compressed markdown for each page with 70-90% fewer tokens than raw HTML.
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| url | Yes | Starting URL to crawl | |
| maxPages | No | Max pages to crawl (default: 10) | |
| maxDepth | No | Max link depth (default: 2) | |
| mode | No | Fetch mode: 'fast' (plain HTTP), 'stealth' (TLS fingerprint), 'render' (headless browser), 'auto' (fast with fallback). Default: 'auto' | |
| include | No | URL patterns to include (glob) | |
| exclude | No | URL patterns to exclude (glob) | |
| timeout | No | Per-page timeout in milliseconds (default: 10000) |
Implementation Reference
- src/server.ts:132-192 (handler)The handler function for the `scraper_crawl_url` tool, which orchestrates the crawling process using the `crawl` function from `@robot-resources/scraper`.
export async function crawlUrl({ url, maxPages, maxDepth, mode, include, exclude, timeout, }: { url: string; maxPages?: number; maxDepth?: number; mode?: FetchMode; include?: string[]; exclude?: string[]; timeout?: number; }) { try { const result = await crawl({ url, limit: maxPages ?? 10, depth: maxDepth ?? 2, mode, include, exclude, timeout, }); const host = new URL(url).host; const errorSuffix = result.errors.length > 0 ? ` (${result.errors.length} error${result.errors.length > 1 ? 's' : ''})` : ''; const summary = `Crawled ${result.totalCrawled} pages from ${host}${errorSuffix}`; const content: Array<{ type: 'text'; text: string }> = [ { type: 'text' as const, text: summary }, ]; for (const page of result.pages) { const header = page.title ? `## ${page.title}\n\n` : ''; content.push({ type: 'text' as const, text: `${header}${page.markdown}`, }); } return { content, structuredContent: { pages: result.pages, totalCrawled: result.totalCrawled, totalDiscovered: result.totalDiscovered, totalSkipped: result.totalSkipped, errors: result.errors, duration: result.duration, }, }; } catch (error) { return formatError(url, error); } } - src/server.ts:50-88 (registration)Registration of the `scraper_crawl_url` tool in the MCP server, including schema definition using Zod.
server.tool( 'scraper_crawl_url', 'Crawl multiple pages from a starting URL using BFS link discovery. Returns compressed markdown for each page with 70-90% fewer tokens than raw HTML.', { url: z.string().url().describe('Starting URL to crawl'), maxPages: z .number() .int() .min(1) .max(100) .optional() .describe('Max pages to crawl (default: 10)'), maxDepth: z .number() .int() .min(0) .max(5) .optional() .describe('Max link depth (default: 2)'), mode: z .enum(['fast', 'stealth', 'render', 'auto']) .optional() .describe("Fetch mode: 'fast' (plain HTTP), 'stealth' (TLS fingerprint), 'render' (headless browser), 'auto' (fast with fallback). Default: 'auto'"), include: z .array(z.string()) .optional() .describe('URL patterns to include (glob)'), exclude: z .array(z.string()) .optional() .describe('URL patterns to exclude (glob)'), timeout: z .number() .positive() .optional() .describe('Per-page timeout in milliseconds (default: 10000)'), }, async (args) => crawlUrl(args), );