Skip to main content
Glama

batch_scrape_urls

Extract data from multiple websites simultaneously by scraping up to 10,000 URLs in a single operation for large-scale content collection.

Instructions

Scrape up to 10k URLs at the same time. Perfect for large-scale data extraction.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
urls_to_scrapeYesJSON array of objects with "url" and optional "custom_id".
output_formatNoChoose format for all URLs. Default: "markdown".markdown
countryNoOptional country code for location-specific scraping.
wait_before_scrapingNoWait time in milliseconds before scraping each URL.
parserNoOptional parser ID for specialized extraction.

Implementation Reference

  • The main execution logic for the batch_scrape_urls tool. It constructs a payload with the provided URLs and options, sends a POST request to the Olostep batch API, handles the response or errors, and returns formatted content.
    handler: async (
    	{
    		urls_to_scrape,
    		output_format,
    		country,
    		wait_before_scraping,
    		parser,
    	}: {
    		urls_to_scrape: BatchScrapeRequestUrl[];
    		output_format: "markdown" | "html" | "json" | "text";
    		country?: string;
    		wait_before_scraping?: number;
    		parser?: string;
    	},
    	apiKey: string,
    	orbitKey?: string,
    ) => {
    	try {
    		const headers = new Headers({
    			"Content-Type": "application/json",
    			Authorization: `Bearer ${apiKey}`,
    		});
    
    		const formats: string[] = [output_format];
    		const payload: Record<string, unknown> = {
    			urls: urls_to_scrape,
    			formats,
    			wait_before_scraping: wait_before_scraping ?? 0,
    		};
    		if (country) payload.country = country;
    		if (orbitKey) payload.force_connection_id = orbitKey;
    		if (parser) payload.parser_extract = { parser_id: parser };
    
    		const response = await fetch(OLOSTEP_BATCH_API_URL, {
    			method: "POST",
    			headers,
    			body: JSON.stringify(payload),
    		});
    
    		if (!response.ok) {
    			let errorDetails: unknown = null;
    			try {
    				errorDetails = await response.json();
    			} catch {
    				// ignore
    			}
    			return {
    				isError: true,
    				content: [
    					{
    						type: "text",
    						text: `Olostep API Error: ${response.status} ${response.statusText}. Details: ${JSON.stringify(
    							errorDetails,
    						)}`,
    					},
    				],
    			};
    		}
    
    		const data = (await response.json()) as OlostepBatchResponse;
    		return {
    			content: [
    				{
    					type: "text",
    					text: JSON.stringify(data, null, 2),
    				},
    			],
    		};
    	} catch (error: unknown) {
    		return {
    			isError: true,
    			content: [
    				{
    					type: "text",
    					text: `Error: Failed to create batch scrape. ${
    						error instanceof Error ? error.message : String(error)
    					}`,
    				},
    			],
    		};
    	}
    },
  • Zod schema defining the input parameters including urls_to_scrape (array of up to 10k URLs), output_format, country, wait_before_scraping, and parser.
    schema: {
    	urls_to_scrape: z
    		.array(
    			z.object({
    				url: z.string().url(),
    				custom_id: z.string().optional(),
    			}),
    		)
    		.min(1)
    		.max(10000)
    		.describe('JSON array of objects with "url" and optional "custom_id".'),
    	output_format: z
    		.enum(["markdown", "html", "json", "text"])
    		.default("markdown")
    		.describe('Choose format for all URLs. Default: "markdown".'),
    	country: z
    		.string()
    		.optional()
    		.describe("Optional country code for location-specific scraping."),
    	wait_before_scraping: z
    		.number()
    		.int()
    		.min(0)
    		.max(10000)
    		.default(0)
    		.describe("Wait time in milliseconds before scraping each URL."),
    	parser: z.string().optional().describe("Optional parser ID for specialized extraction."),
    },
  • src/index.ts:74-86 (registration)
    Registration of the batch_scrape_urls tool with the MCP server using server.tool(), including API key check and wrapper around the tool's handler.
    server.tool(
        batchScrapeUrls.name,
        batchScrapeUrls.description,
        batchScrapeUrls.schema,
        async (params) => {
            if (!OLOSTEP_API_KEY) return missingApiKeyError;
            const result = await batchScrapeUrls.handler(params, OLOSTEP_API_KEY, ORBIT_KEY);
            return {
                ...result,
                content: result.content.map(item => ({ ...item, type: item.type as "text" }))
            };
        }
    );

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/olostep/olostep-mcp-server'

If you have feedback or need assistance with the MCP directory API, please join our Discord server