Skip to main content
Glama

scrape

Read-only

Extract webpage content for analysis by fetching clean markdown, HTML, or structured JSON. Handles JavaScript-rendered pages and bypasses anti-bot protection.

Instructions

Scrape any webpage and return its content using ZenRows.

Use this tool to fetch webpage content for analysis. By default it returns clean markdown, which is ideal for LLM processing.

When to enable options:

  • js_render: page uses React/Vue/Angular, loads content dynamically, or content appears missing on the first attempt

  • premium_proxy: site returns 403/blocked errors even with js_render enabled

  • wait_for: specific content loads after initial render (requires js_render)

  • css_extractor: you only need specific elements, not the whole page

  • autoparse: structured data pages like products or articles

Examples: Basic: { url: "https://example.com" } Dynamic: { url: "https://spa.com", js_render: true } Protected:{ url: "https://protected.com", js_render: true, premium_proxy: true } Extract: { url: "https://shop.com", css_extractor: '{"title":"h1","price":".price"}' }

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
urlYesThe webpage URL to scrape
js_renderNoEnable JavaScript rendering via headless browser. Required for SPAs (React, Vue, Angular) and pages that load content dynamically.
premium_proxyNoUse premium residential proxies to bypass anti-bot protection. Required for heavily protected sites. Implies higher credit cost.
proxy_countryNoCountry for geo-targeted scraping. ISO 3166-1 alpha-2 code (e.g. 'US', 'GB', 'DE'). Requires premium_proxy=true.
response_typeNoOutput format. 'markdown' (default) preserves structure and is ideal for LLMs. 'plaintext' strips all formatting for pure text extraction. 'pdf' returns a PDF of the page. 'html' returns the raw HTML source (omits the response_type param; ZenRows default). Ignored when autoparse, css_extractor, outputs, or screenshot params are set.markdown
autoparseNoAutomatically extract structured data from the page into JSON. Best for product pages, articles, and listings.
css_extractorNoExtract specific elements using CSS selectors. JSON object mapping names to selectors, e.g. '{"title":"h1","price":".price-tag"}'. Returns JSON instead of full page content.
wait_forNoCSS selector to wait for before capturing. Use when key content loads after the initial page render. Requires js_render=true.
waitNoMilliseconds to wait after page load before capturing content. Max 30000 (30s). Requires js_render=true.
js_instructionsNoJSON array of browser interactions to run before scraping. Requires js_render=true. Example: [{"click":"#load-more"},{"wait":1000},{"wait_for":".results"}]
outputsNoComma-separated list of data types to extract as structured JSON. Available: emails, headings, links, menus, images, videos, audios. Use '*' for all types. Returns JSON instead of full page content.
screenshotNoCapture an above-the-fold screenshot of the page. Returns an image instead of text content. Useful for visual verification or debugging.
screenshot_fullpageNoCapture a full-page screenshot including content below the fold. Returns an image instead of text content.
screenshot_selectorNoCapture a screenshot of a specific element using a CSS selector. Example: ".product-card". Returns an image instead of text content.

Implementation Reference

  • The implementation of the 'scrape' tool handler, which constructs API parameters, performs a fetch request to the ZenRows API, and processes the response (text or image).
    async (params) => {
      const searchParams = new URLSearchParams({
        apikey: apiKey,
        url: params.url,
      });
    
      if (
        params.js_render ||
        params.screenshot ||
        params.screenshot_fullpage ||
        params.screenshot_selector
      )
        searchParams.set("js_render", "true");
      if (params.premium_proxy) searchParams.set("premium_proxy", "true");
      if (params.proxy_country)
        searchParams.set("proxy_country", params.proxy_country.toUpperCase());
      if (params.autoparse) searchParams.set("autoparse", "true");
      if (params.css_extractor) searchParams.set("css_extractor", params.css_extractor);
      if (params.wait_for) searchParams.set("wait_for", params.wait_for);
      if (params.wait != null) searchParams.set("wait", String(params.wait));
      if (params.js_instructions) searchParams.set("js_instructions", params.js_instructions);
      if (params.outputs) searchParams.set("outputs", params.outputs);
      if (params.screenshot || params.screenshot_fullpage || params.screenshot_selector)
        searchParams.set("screenshot", "true");
      if (params.screenshot_fullpage) searchParams.set("screenshot_fullpage", "true");
      if (params.screenshot_selector)
        searchParams.set("screenshot_selector", params.screenshot_selector);
    
      // response_type is mutually exclusive with autoparse, css_extractor, outputs, and screenshot params.
      // 'html' is the ZenRows default (no param); all other values are passed through.
      const isScreenshot =
        params.screenshot || params.screenshot_fullpage || params.screenshot_selector;
      const effectiveType = params.response_type ?? "markdown";
      if (
        !params.autoparse &&
        !params.css_extractor &&
        !params.outputs &&
        !isScreenshot &&
        effectiveType !== "html"
      ) {
        searchParams.set("response_type", effectiveType);
      }
    
      let response: Response;
      try {
        response = await fetch(`${ZENROWS_API_URL}?${searchParams}`, {
          headers: { "User-Agent": `zenrows/mcp ${pkg.version}` },
        });
      } catch (err) {
        return {
          content: [
            {
              type: "text" as const,
              text: `Network error contacting ZenRows: ${err instanceof Error ? err.message : String(err)}`,
            },
          ],
          isError: true,
        };
      }
    
      if (!response.ok) {
        const body = await response.text();
        return {
          content: [{ type: "text" as const, text: `ZenRows error ${response.status}: ${body}` }],
          isError: true,
        };
      }
    
      const contentType = response.headers.get("content-type") ?? "";
      const buffer = await response.arrayBuffer();
      const bytes = new Uint8Array(buffer);
      const isPng =
        bytes[0] === 0x89 && bytes[1] === 0x50 && bytes[2] === 0x4e && bytes[3] === 0x47;
      const isJpeg = bytes[0] === 0xff && bytes[1] === 0xd8;
      if (contentType.startsWith("image/") || isPng || isJpeg) {
        const mimeType = isPng
          ? "image/png"
          : isJpeg
            ? "image/jpeg"
            : (contentType.split(";")[0].trim() as "image/png" | "image/jpeg");
        const base64 = Buffer.from(buffer).toString("base64");
        return {
          content: [{ type: "image" as const, data: base64, mimeType }],
        };
      }
    
      return {
        content: [{ type: "text" as const, text: new TextDecoder().decode(buffer) }],
      };
    }
  • src/server.ts:18-162 (registration)
    Registration of the 'scrape' tool with its input schema and configuration.
      server.registerTool(
        "scrape",
        {
          annotations: {
            title: "Scrape Webpage",
            readOnlyHint: true,
            destructiveHint: false,
          },
          description: `Scrape any webpage and return its content using ZenRows.
    
    Use this tool to fetch webpage content for analysis. By default it returns clean
    markdown, which is ideal for LLM processing.
    
    When to enable options:
    - js_render: page uses React/Vue/Angular, loads content dynamically, or content
      appears missing on the first attempt
    - premium_proxy: site returns 403/blocked errors even with js_render enabled
    - wait_for: specific content loads after initial render (requires js_render)
    - css_extractor: you only need specific elements, not the whole page
    - autoparse: structured data pages like products or articles
    
    Examples:
      Basic:    { url: "https://example.com" }
      Dynamic:  { url: "https://spa.com", js_render: true }
      Protected:{ url: "https://protected.com", js_render: true, premium_proxy: true }
      Extract:  { url: "https://shop.com", css_extractor: '{"title":"h1","price":".price"}' }`,
          inputSchema: {
            url: z.string().url().describe("The webpage URL to scrape"),
    
            js_render: z
              .boolean()
              .optional()
              .default(false)
              .describe(
                "Enable JavaScript rendering via headless browser. Required for SPAs " +
                  "(React, Vue, Angular) and pages that load content dynamically."
              ),
    
            premium_proxy: z
              .boolean()
              .optional()
              .default(false)
              .describe(
                "Use premium residential proxies to bypass anti-bot protection. " +
                  "Required for heavily protected sites. Implies higher credit cost."
              ),
    
            proxy_country: z
              .string()
              .optional()
              .describe(
                "Country for geo-targeted scraping. ISO 3166-1 alpha-2 code (e.g. 'US', 'GB', 'DE'). " +
                  "Requires premium_proxy=true."
              ),
    
            response_type: z
              .enum(["markdown", "plaintext", "pdf", "html"])
              .optional()
              .default("markdown")
              .describe(
                "Output format. 'markdown' (default) preserves structure and is ideal for LLMs. " +
                  "'plaintext' strips all formatting for pure text extraction. " +
                  "'pdf' returns a PDF of the page. " +
                  "'html' returns the raw HTML source (omits the response_type param; ZenRows default). " +
                  "Ignored when autoparse, css_extractor, outputs, or screenshot params are set."
              ),
    
            autoparse: z
              .boolean()
              .optional()
              .describe(
                "Automatically extract structured data from the page into JSON. " +
                  "Best for product pages, articles, and listings."
              ),
    
            css_extractor: z
              .string()
              .optional()
              .describe(
                "Extract specific elements using CSS selectors. " +
                  'JSON object mapping names to selectors, e.g. \'{"title":"h1","price":".price-tag"}\'. ' +
                  "Returns JSON instead of full page content."
              ),
    
            wait_for: z
              .string()
              .optional()
              .describe(
                "CSS selector to wait for before capturing. Use when key content loads " +
                  "after the initial page render. Requires js_render=true."
              ),
    
            wait: z
              .number()
              .int()
              .min(0)
              .max(30000)
              .optional()
              .describe(
                "Milliseconds to wait after page load before capturing content. " +
                  "Max 30000 (30s). Requires js_render=true."
              ),
    
            js_instructions: z
              .string()
              .optional()
              .describe(
                "JSON array of browser interactions to run before scraping. Requires js_render=true. " +
                  'Example: [{"click":"#load-more"},{"wait":1000},{"wait_for":".results"}]'
              ),
    
            outputs: z
              .string()
              .optional()
              .describe(
                "Comma-separated list of data types to extract as structured JSON. " +
                  "Available: emails, headings, links, menus, images, videos, audios. " +
                  "Use '*' for all types. Returns JSON instead of full page content."
              ),
    
            screenshot: z
              .boolean()
              .optional()
              .describe(
                "Capture an above-the-fold screenshot of the page. " +
                  "Returns an image instead of text content. Useful for visual verification or debugging."
              ),
    
            screenshot_fullpage: z
              .boolean()
              .optional()
              .describe(
                "Capture a full-page screenshot including content below the fold. " +
                  "Returns an image instead of text content."
              ),
    
            screenshot_selector: z
              .string()
              .optional()
              .describe(
                "Capture a screenshot of a specific element using a CSS selector. " +
                  'Example: ".product-card". Returns an image instead of text content.'
              ),
          },
        },

Tool Definition Quality

Score is being calculated. Check back soon.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/ZenRows/zenrows-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server