Skip to main content
Glama

crawl

Initiate bulk web crawling with structured data extraction. Specify a starting URL, page limit, and optional JSON schema to gather web content programmatically.

Instructions

Start an async bulk crawl with optional extraction. Returns a job ID to poll with job_status. Costs 1 credit per page.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
urlYesStarting URL to crawl
max_pagesNoMaximum pages to crawl (default: 10)
schemaNoJSON schema for structured extraction per page

Implementation Reference

  • The handler function for the 'crawl' tool. It constructs the request body with url, max_pages, and optional schema, then calls apiPost('/crawl', body) to start an async bulk crawl job and returns the result formatted as JSON.
    async ({ url, max_pages, schema }) => {
      const body: Record<string, unknown> = { url, max_pages };
      if (schema) body.schema = schema;
      return jsonResult(await apiPost("/crawl", body));
    }
  • Zod schema definition for the 'crawl' tool inputs: url (string, required), max_pages (number, optional with default 10), and schema (record, optional) for structured extraction per page.
    {
      url: z.string().describe("Starting URL to crawl"),
      max_pages: z.number().optional().default(10).describe("Maximum pages to crawl (default: 10)"),
      schema: z.record(z.unknown()).optional().describe("JSON schema for structured extraction per page"),
    },
  • src/index.ts:144-157 (registration)
    Registration of the 'crawl' tool with the MCP server using server.tool(). Defines the tool name, description, input schema, and handler function.
    server.tool(
      "crawl",
      "Start an async bulk crawl with optional extraction. Returns a job ID to poll with job_status. Costs 1 credit per page.",
      {
        url: z.string().describe("Starting URL to crawl"),
        max_pages: z.number().optional().default(10).describe("Maximum pages to crawl (default: 10)"),
        schema: z.record(z.unknown()).optional().describe("JSON schema for structured extraction per page"),
      },
      async ({ url, max_pages, schema }) => {
        const body: Record<string, unknown> = { url, max_pages };
        if (schema) body.schema = schema;
        return jsonResult(await apiPost("/crawl", body));
      }
    );
  • Helper function apiPost that makes HTTP POST requests to the SearchClaw API. Handles JSON serialization, headers, 30-second timeout, and error handling for non-OK responses.
    async function apiPost(path: string, body: Record<string, unknown>) {
      const controller = new AbortController();
      const timeout = setTimeout(() => controller.abort(), 30000);
      try {
        const response = await fetch(`${API_BASE}${path}`, {
          method: "POST",
          headers: { ...headers, "Content-Type": "application/json" },
          body: JSON.stringify(body),
          signal: controller.signal,
        });
        if (!response.ok) {
          const text = await response.text();
          throw new Error(`SearchClaw API error ${response.status}: ${text}`);
        }
        return response.json();
      } finally {
        clearTimeout(timeout);
      }
    }
  • Helper function jsonResult that formats API response data into MCP-compliant content format with type 'text' and JSON stringification with 2-space indentation.
    function jsonResult(data: unknown) {
      return { content: [{ type: "text" as const, text: JSON.stringify(data, null, 2) }] };
    }

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/CSteenkamp/searchclaw-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server