crawl
Initiate bulk web crawling with structured data extraction. Specify a starting URL, page limit, and optional JSON schema to gather web content programmatically.
Instructions
Start an async bulk crawl with optional extraction. Returns a job ID to poll with job_status. Costs 1 credit per page.
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| url | Yes | Starting URL to crawl | |
| max_pages | No | Maximum pages to crawl (default: 10) | |
| schema | No | JSON schema for structured extraction per page |
Implementation Reference
- src/index.ts:152-156 (handler)The handler function for the 'crawl' tool. It constructs the request body with url, max_pages, and optional schema, then calls apiPost('/crawl', body) to start an async bulk crawl job and returns the result formatted as JSON.
async ({ url, max_pages, schema }) => { const body: Record<string, unknown> = { url, max_pages }; if (schema) body.schema = schema; return jsonResult(await apiPost("/crawl", body)); } - src/index.ts:147-151 (schema)Zod schema definition for the 'crawl' tool inputs: url (string, required), max_pages (number, optional with default 10), and schema (record, optional) for structured extraction per page.
{ url: z.string().describe("Starting URL to crawl"), max_pages: z.number().optional().default(10).describe("Maximum pages to crawl (default: 10)"), schema: z.record(z.unknown()).optional().describe("JSON schema for structured extraction per page"), }, - src/index.ts:144-157 (registration)Registration of the 'crawl' tool with the MCP server using server.tool(). Defines the tool name, description, input schema, and handler function.
server.tool( "crawl", "Start an async bulk crawl with optional extraction. Returns a job ID to poll with job_status. Costs 1 credit per page.", { url: z.string().describe("Starting URL to crawl"), max_pages: z.number().optional().default(10).describe("Maximum pages to crawl (default: 10)"), schema: z.record(z.unknown()).optional().describe("JSON schema for structured extraction per page"), }, async ({ url, max_pages, schema }) => { const body: Record<string, unknown> = { url, max_pages }; if (schema) body.schema = schema; return jsonResult(await apiPost("/crawl", body)); } ); - src/index.ts:41-59 (helper)Helper function apiPost that makes HTTP POST requests to the SearchClaw API. Handles JSON serialization, headers, 30-second timeout, and error handling for non-OK responses.
async function apiPost(path: string, body: Record<string, unknown>) { const controller = new AbortController(); const timeout = setTimeout(() => controller.abort(), 30000); try { const response = await fetch(`${API_BASE}${path}`, { method: "POST", headers: { ...headers, "Content-Type": "application/json" }, body: JSON.stringify(body), signal: controller.signal, }); if (!response.ok) { const text = await response.text(); throw new Error(`SearchClaw API error ${response.status}: ${text}`); } return response.json(); } finally { clearTimeout(timeout); } } - src/index.ts:61-63 (helper)Helper function jsonResult that formats API response data into MCP-compliant content format with type 'text' and JSON stringification with 2-space indentation.
function jsonResult(data: unknown) { return { content: [{ type: "text" as const, text: JSON.stringify(data, null, 2) }] }; }