Skip to main content
Glama

create_crawl

Discover and scrape entire websites by following links from a starting URL to extract content in various formats for data collection.

Instructions

Autonomously discover and scrape entire websites by following links from a start URL.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
start_urlYesStarting URL for the crawl.
max_pagesNoMaximum number of pages to crawl.
follow_linksNoWhether to follow links found on pages.
output_formatNoFormat for scraped content. Default: "markdown".markdown
countryNoOptional country code for location-specific crawling.
parserNoOptional parser ID for specialized content extraction.

Implementation Reference

  • The async handler function that sends a POST request to Olostep API to create a crawl job based on input parameters like start_url, max_pages, etc., handles errors, and returns the response.
    handler: async ( { start_url, max_pages, follow_links, output_format, country, parser, }: { start_url: string; max_pages?: number; follow_links?: boolean; output_format: "markdown" | "html" | "json" | "text"; country?: string; parser?: string; }, apiKey: string, orbitKey?: string, ) => { try { const headers = new Headers({ "Content-Type": "application/json", Authorization: `Bearer ${apiKey}`, }); const formats: string[] = [output_format]; const payload: Record<string, unknown> = { start_url, max_pages: max_pages ?? 10, follow_links: follow_links ?? true, formats, }; if (country) payload.country = country; if (orbitKey) payload.force_connection_id = orbitKey; if (parser) payload.parser_extract = { parser_id: parser }; const response = await fetch(OLOSTEP_CRAWL_API_URL, { method: "POST", headers, body: JSON.stringify(payload), }); if (!response.ok) { let errorDetails: unknown = null; try { errorDetails = await response.json(); } catch { // ignore } return { isError: true, content: [ { type: "text", text: `Olostep API Error: ${response.status} ${response.statusText}. Details: ${JSON.stringify( errorDetails, )}`, }, ], }; } const data = (await response.json()) as OlostepCrawlResponse; return { content: [ { type: "text", text: JSON.stringify(data, null, 2), }, ], }; } catch (error: unknown) { return { isError: true, content: [ { type: "text", text: `Error: Failed to create crawl. ${error instanceof Error ? error.message : String(error)}`, }, ], }; } },
  • Zod-based input schema defining parameters for the create_crawl tool: start_url (required URL), max_pages, follow_links, output_format, country, parser.
    schema: { start_url: z.string().url().describe("Starting URL for the crawl."), max_pages: z.number().int().min(1).default(10).describe("Maximum number of pages to crawl."), follow_links: z.boolean().default(true).describe("Whether to follow links found on pages."), output_format: z .enum(["markdown", "html", "json", "text"]) .default("markdown") .describe('Format for scraped content. Default: "markdown".'), country: z.string().optional().describe("Optional country code for location-specific crawling."), parser: z.string().optional().describe("Optional parser ID for specialized content extraction."), },
  • src/index.ts:58-71 (registration)
    MCP server registration of the create_crawl tool using server.tool(name, description, schema, handlerWrapper), which checks for API key and calls the tool handler.
    // Register Create Crawl tool server.tool( createCrawl.name, createCrawl.description, createCrawl.schema, async (params) => { if (!OLOSTEP_API_KEY) return missingApiKeyError; const result = await createCrawl.handler(params, OLOSTEP_API_KEY, ORBIT_KEY); return { ...result, content: result.content.map(item => ({ ...item, type: item.type as "text" })) }; } );
  • TypeScript interface defining the structure of the Olostep crawl API response.
    export interface OlostepCrawlResponse { crawl_id?: string; object?: string; status?: string; start_url?: string; max_pages?: number; follow_links?: boolean; created?: string; formats?: string[]; country?: string; parser?: string; }

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/olostep/olostep-mcp-server'

If you have feedback or need assistance with the MCP directory API, please join our Discord server