batch_scrape_urls

Extract data from multiple websites simultaneously by scraping up to 10,000 URLs in a single operation for large-scale content collection.

Instructions

Scrape up to 10k URLs at the same time. Perfect for large-scale data extraction.

Input Schema

TableJSON Schema

Name	Required	Description	Default
`urls_to_scrape`	Yes	JSON array of objects with "url" and optional "custom_id".
`output_format`	No	Choose format for all URLs. Default: "markdown".	markdown
`country`	No	Optional country code for location-specific scraping.
`wait_before_scraping`	No	Wait time in milliseconds before scraping each URL.
`parser`	No	Optional parser ID for specialized extraction.

Implementation Reference

src/tools/batchScrapeUrls.ts:55-136 (handler)

The main execution logic for the batch_scrape_urls tool. It constructs a payload with the provided URLs and options, sends a POST request to the Olostep batch API, handles the response or errors, and returns formatted content.

handler: async (
	{
		urls_to_scrape,
		output_format,
		country,
		wait_before_scraping,
		parser,
	}: {
		urls_to_scrape: BatchScrapeRequestUrl[];
		output_format: "markdown" | "html" | "json" | "text";
		country?: string;
		wait_before_scraping?: number;
		parser?: string;
	},
	apiKey: string,
	orbitKey?: string,
) => {
	try {
		const headers = new Headers({
			"Content-Type": "application/json",
			Authorization: `Bearer ${apiKey}`,
		});

		const formats: string[] = [output_format];
		const payload: Record<string, unknown> = {
			urls: urls_to_scrape,
			formats,
			wait_before_scraping: wait_before_scraping ?? 0,
		};
		if (country) payload.country = country;
		if (orbitKey) payload.force_connection_id = orbitKey;
		if (parser) payload.parser_extract = { parser_id: parser };

		const response = await fetch(OLOSTEP_BATCH_API_URL, {
			method: "POST",
			headers,
			body: JSON.stringify(payload),
		});

		if (!response.ok) {
			let errorDetails: unknown = null;
			try {
				errorDetails = await response.json();
			} catch {
				// ignore
			}
			return {
				isError: true,
				content: [
					{
						type: "text",
						text: `Olostep API Error: ${response.status} ${response.statusText}. Details: ${JSON.stringify(
							errorDetails,
						)}`,
					},
				],
			};
		}

		const data = (await response.json()) as OlostepBatchResponse;
		return {
			content: [
				{
					type: "text",
					text: JSON.stringify(data, null, 2),
				},
			],
		};
	} catch (error: unknown) {
		return {
			isError: true,
			content: [
				{
					type: "text",
					text: `Error: Failed to create batch scrape. ${
						error instanceof Error ? error.message : String(error)
					}`,
				},
			],
		};
	}
},

src/tools/batchScrapeUrls.ts:27-54 (schema)

Zod schema defining the input parameters including urls_to_scrape (array of up to 10k URLs), output_format, country, wait_before_scraping, and parser.

schema: {
	urls_to_scrape: z
		.array(
			z.object({
				url: z.string().url(),
				custom_id: z.string().optional(),
			}),
		)
		.min(1)
		.max(10000)
		.describe('JSON array of objects with "url" and optional "custom_id".'),
	output_format: z
		.enum(["markdown", "html", "json", "text"])
		.default("markdown")
		.describe('Choose format for all URLs. Default: "markdown".'),
	country: z
		.string()
		.optional()
		.describe("Optional country code for location-specific scraping."),
	wait_before_scraping: z
		.number()
		.int()
		.min(0)
		.max(10000)
		.default(0)
		.describe("Wait time in milliseconds before scraping each URL."),
	parser: z.string().optional().describe("Optional parser ID for specialized extraction."),
},

src/index.ts:74-86 (registration)

Registration of the batch_scrape_urls tool with the MCP server using server.tool(), including API key check and wrapper around the tool's handler.

server.tool(
    batchScrapeUrls.name,
    batchScrapeUrls.description,
    batchScrapeUrls.schema,
    async (params) => {
        if (!OLOSTEP_API_KEY) return missingApiKeyError;
        const result = await batchScrapeUrls.handler(params, OLOSTEP_API_KEY, ORBIT_KEY);
        return {
            ...result,
            content: result.content.map(item => ({ ...item, type: item.type as "text" }))
        };
    }
);

olostep-mcp