Skip to main content
Glama

create_crawl

Discover and scrape entire websites by following links from a starting URL to extract content in various formats for data collection.

Instructions

Autonomously discover and scrape entire websites by following links from a start URL.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
start_urlYesStarting URL for the crawl.
max_pagesNoMaximum number of pages to crawl.
follow_linksNoWhether to follow links found on pages.
output_formatNoFormat for scraped content. Default: "markdown".markdown
countryNoOptional country code for location-specific crawling.
parserNoOptional parser ID for specialized content extraction.

Implementation Reference

  • The async handler function that sends a POST request to Olostep API to create a crawl job based on input parameters like start_url, max_pages, etc., handles errors, and returns the response.
    handler: async (
    	{
    		start_url,
    		max_pages,
    		follow_links,
    		output_format,
    		country,
    		parser,
    	}: {
    		start_url: string;
    		max_pages?: number;
    		follow_links?: boolean;
    		output_format: "markdown" | "html" | "json" | "text";
    		country?: string;
    		parser?: string;
    	},
    	apiKey: string,
    	orbitKey?: string,
    ) => {
    	try {
    		const headers = new Headers({
    			"Content-Type": "application/json",
    			Authorization: `Bearer ${apiKey}`,
    		});
    
    		const formats: string[] = [output_format];
    		const payload: Record<string, unknown> = {
    			start_url,
    			max_pages: max_pages ?? 10,
    			follow_links: follow_links ?? true,
    			formats,
    		};
    		if (country) payload.country = country;
    		if (orbitKey) payload.force_connection_id = orbitKey;
    		if (parser) payload.parser_extract = { parser_id: parser };
    
    		const response = await fetch(OLOSTEP_CRAWL_API_URL, {
    			method: "POST",
    			headers,
    			body: JSON.stringify(payload),
    		});
    
    		if (!response.ok) {
    			let errorDetails: unknown = null;
    			try {
    				errorDetails = await response.json();
    			} catch {
    				// ignore
    			}
    			return {
    				isError: true,
    				content: [
    					{
    						type: "text",
    						text: `Olostep API Error: ${response.status} ${response.statusText}. Details: ${JSON.stringify(
    							errorDetails,
    						)}`,
    					},
    				],
    			};
    		}
    
    		const data = (await response.json()) as OlostepCrawlResponse;
    		return {
    			content: [
    				{
    					type: "text",
    					text: JSON.stringify(data, null, 2),
    				},
    			],
    		};
    	} catch (error: unknown) {
    		return {
    			isError: true,
    			content: [
    				{
    					type: "text",
    					text: `Error: Failed to create crawl. ${error instanceof Error ? error.message : String(error)}`,
    				},
    			],
    		};
    	}
    },
  • Zod-based input schema defining parameters for the create_crawl tool: start_url (required URL), max_pages, follow_links, output_format, country, parser.
    schema: {
    	start_url: z.string().url().describe("Starting URL for the crawl."),
    	max_pages: z.number().int().min(1).default(10).describe("Maximum number of pages to crawl."),
    	follow_links: z.boolean().default(true).describe("Whether to follow links found on pages."),
    	output_format: z
    		.enum(["markdown", "html", "json", "text"])
    		.default("markdown")
    		.describe('Format for scraped content. Default: "markdown".'),
    	country: z.string().optional().describe("Optional country code for location-specific crawling."),
    	parser: z.string().optional().describe("Optional parser ID for specialized content extraction."),
    },
  • src/index.ts:58-71 (registration)
    MCP server registration of the create_crawl tool using server.tool(name, description, schema, handlerWrapper), which checks for API key and calls the tool handler.
    // Register Create Crawl tool
    server.tool(
        createCrawl.name,
        createCrawl.description,
        createCrawl.schema,
        async (params) => {
            if (!OLOSTEP_API_KEY) return missingApiKeyError;
            const result = await createCrawl.handler(params, OLOSTEP_API_KEY, ORBIT_KEY);
            return {
                ...result,
                content: result.content.map(item => ({ ...item, type: item.type as "text" }))
            };
        }
    );
  • TypeScript interface defining the structure of the Olostep crawl API response.
    export interface OlostepCrawlResponse {
    	crawl_id?: string;
    	object?: string;
    	status?: string;
    	start_url?: string;
    	max_pages?: number;
    	follow_links?: boolean;
    	created?: string;
    	formats?: string[];
    	country?: string;
    	parser?: string;
    }

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/olostep/olostep-mcp-server'

If you have feedback or need assistance with the MCP directory API, please join our Discord server