scrape_website
Extract content from any website URL in multiple formats including markdown, HTML, JSON, or text. Supports JavaScript rendering and location-specific scraping for dynamic content.
Instructions
Extract content from a single URL. Supports multiple formats and JavaScript rendering.
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| url_to_scrape | Yes | The URL of the website you want to scrape. | |
| output_format | No | Choose format ("html", "markdown", "json", or "text"). Default: "markdown" | markdown |
| country | No | Optional country code (e.g., US, GB, CA) for location-specific scraping. | |
| wait_before_scraping | No | Wait time in milliseconds before scraping (0-10000). Useful for dynamic content. | |
| parser | No | Optional parser ID for specialized extraction (e.g., "@olostep/amazon-product"). |
Implementation Reference
- src/tools/scrapeWebsite.ts:48-133 (handler)The main handler function that implements the scrape_website tool logic by calling the Olostep API to scrape the specified URL.handler: async ( { url_to_scrape, output_format, country, wait_before_scraping, parser, }: { url_to_scrape: string; output_format: "markdown" | "html" | "json" | "text"; country?: string; wait_before_scraping?: number; parser?: string; }, apiKey: string, orbitKey?: string, ) => { try { const headers = new Headers({ "Content-Type": "application/json", Authorization: `Bearer ${apiKey}`, }); const formats: string[] = [output_format]; const body: Record<string, unknown> = { url_to_scrape, formats, wait_before_scraping: wait_before_scraping ?? 0, }; if (country) body.country = country; if (orbitKey) body.force_connection_id = orbitKey; if (parser) { // Include parser-based extraction alongside selected format body.parser_extract = { parser_id: parser }; } const response = await fetch(OLOSTEP_SCRAPE_API_URL, { method: "POST", headers, body: JSON.stringify(body), }); if (!response.ok) { let errorDetails: unknown = null; try { errorDetails = await response.json(); } catch { // ignore JSON parse error for non-JSON error bodies } return { isError: true, content: [ { type: "text", text: `Olostep API Error: ${response.status} ${response.statusText}. Details: ${JSON.stringify( errorDetails, )}`, }, ], }; } const data = (await response.json()) as OlostepScrapeResponse; // Return the full result object so callers can access id/url/status/contents return { content: [ { type: "text", text: JSON.stringify(data.result ?? {}, null, 2), }, ], }; } catch (error: unknown) { return { isError: true, content: [ { type: "text", text: `Error: Failed to scrape website. ${ error instanceof Error ? error.message : String(error) }`, }, ], }; } },
- src/tools/scrapeWebsite.ts:26-47 (schema)Input schema using Zod for validating parameters of the scrape_website tool.schema: { url_to_scrape: z.string().url().describe("The URL of the website you want to scrape."), output_format: z .enum(["markdown", "html", "json", "text"]) .default("markdown") .describe('Choose format ("html", "markdown", "json", or "text"). Default: "markdown"'), country: z .string() .optional() .describe("Optional country code (e.g., US, GB, CA) for location-specific scraping."), wait_before_scraping: z .number() .int() .min(0) .max(10000) .default(0) .describe("Wait time in milliseconds before scraping (0-10000). Useful for dynamic content."), parser: z .string() .optional() .describe('Optional parser ID for specialized extraction (e.g., "@olostep/amazon-product").'), },
- src/index.ts:119-131 (registration)Registration of the scrape_website tool with the MCP server using server.tool().server.tool( scrapeWebsite.name, scrapeWebsite.description, scrapeWebsite.schema, async (params) => { if (!OLOSTEP_API_KEY) return missingApiKeyError; const result = await scrapeWebsite.handler(params, OLOSTEP_API_KEY, ORBIT_KEY); return { ...result, content: result.content.map(item => ({ ...item, type: item.type as "text" })) }; } );