Skip to main content
Glama

batch_scrape_urls

Extract data from multiple websites simultaneously by scraping up to 10,000 URLs in a single operation for large-scale content collection.

Instructions

Scrape up to 10k URLs at the same time. Perfect for large-scale data extraction.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
urls_to_scrapeYesJSON array of objects with "url" and optional "custom_id".
output_formatNoChoose format for all URLs. Default: "markdown".markdown
countryNoOptional country code for location-specific scraping.
wait_before_scrapingNoWait time in milliseconds before scraping each URL.
parserNoOptional parser ID for specialized extraction.

Implementation Reference

  • The main execution logic for the batch_scrape_urls tool. It constructs a payload with the provided URLs and options, sends a POST request to the Olostep batch API, handles the response or errors, and returns formatted content.
    handler: async ( { urls_to_scrape, output_format, country, wait_before_scraping, parser, }: { urls_to_scrape: BatchScrapeRequestUrl[]; output_format: "markdown" | "html" | "json" | "text"; country?: string; wait_before_scraping?: number; parser?: string; }, apiKey: string, orbitKey?: string, ) => { try { const headers = new Headers({ "Content-Type": "application/json", Authorization: `Bearer ${apiKey}`, }); const formats: string[] = [output_format]; const payload: Record<string, unknown> = { urls: urls_to_scrape, formats, wait_before_scraping: wait_before_scraping ?? 0, }; if (country) payload.country = country; if (orbitKey) payload.force_connection_id = orbitKey; if (parser) payload.parser_extract = { parser_id: parser }; const response = await fetch(OLOSTEP_BATCH_API_URL, { method: "POST", headers, body: JSON.stringify(payload), }); if (!response.ok) { let errorDetails: unknown = null; try { errorDetails = await response.json(); } catch { // ignore } return { isError: true, content: [ { type: "text", text: `Olostep API Error: ${response.status} ${response.statusText}. Details: ${JSON.stringify( errorDetails, )}`, }, ], }; } const data = (await response.json()) as OlostepBatchResponse; return { content: [ { type: "text", text: JSON.stringify(data, null, 2), }, ], }; } catch (error: unknown) { return { isError: true, content: [ { type: "text", text: `Error: Failed to create batch scrape. ${ error instanceof Error ? error.message : String(error) }`, }, ], }; } },
  • Zod schema defining the input parameters including urls_to_scrape (array of up to 10k URLs), output_format, country, wait_before_scraping, and parser.
    schema: { urls_to_scrape: z .array( z.object({ url: z.string().url(), custom_id: z.string().optional(), }), ) .min(1) .max(10000) .describe('JSON array of objects with "url" and optional "custom_id".'), output_format: z .enum(["markdown", "html", "json", "text"]) .default("markdown") .describe('Choose format for all URLs. Default: "markdown".'), country: z .string() .optional() .describe("Optional country code for location-specific scraping."), wait_before_scraping: z .number() .int() .min(0) .max(10000) .default(0) .describe("Wait time in milliseconds before scraping each URL."), parser: z.string().optional().describe("Optional parser ID for specialized extraction."), },
  • src/index.ts:74-86 (registration)
    Registration of the batch_scrape_urls tool with the MCP server using server.tool(), including API key check and wrapper around the tool's handler.
    server.tool( batchScrapeUrls.name, batchScrapeUrls.description, batchScrapeUrls.schema, async (params) => { if (!OLOSTEP_API_KEY) return missingApiKeyError; const result = await batchScrapeUrls.handler(params, OLOSTEP_API_KEY, ORBIT_KEY); return { ...result, content: result.content.map(item => ({ ...item, type: item.type as "text" })) }; } );

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/olostep/olostep-mcp-server'

If you have feedback or need assistance with the MCP directory API, please join our Discord server