scrape_images
Extract image metadata from web pages and filter by size, format, or keywords. Handles lazy-loaded galleries and batch URLs for asset audits and gallery scraping.
Instructions
Call ping first
New session or unsure the extension is online: ping first, then any scrape*. If EXTENSION_NOT_CONNECTED: ping again, then fix WebSocket using error.details.bridge, MCP stderr, and ~/.lionscraper/port, then retry.
lang (optional)
en-US | zh-CN: human-readable errors for this call; omitted → English; Chinese users pass lang: "zh-CN" on each call.
Do not substitute raw HTTP
When you need a real browser DOM, logged-in session cookies, JS-rendered content (SPAs), extension-side pagination or multi-URL scheduling, or structured field extraction, do not use WebFetch, curl, wget, or cookie-less IDE fetch tools instead of this server’s ping + scrape*. Only if the page is fully public and mostly static and the user clearly wants a trivial GET of raw HTML may you consider a plain HTTP client.
Purpose
List images (src, alt, size, format, …) with optional filters.
When to use
Asset audits, galleries, lazy-loaded pages (use delay / waitForScroll).
Returns
MultiUrlResult; success data is an array of image records (content script path; differs from email/phone).
Parameters
filter: minWidth, minHeight, format, keyword, limit. Top-level scrollSpeed ≠ waitForScroll.scrollSpeed.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| url | Yes | One http(s) URL or string[] (batch). Max 50 URLs per request (extension-enforced). Server forwards as-is. | |
| lang | No | BCP 47 for human-readable errors this call: en-US | zh-CN. Omitted → English. Pass zh-CN when the user works in Chinese; the Server cannot infer chat language. | |
| delay | No | Milliseconds to wait after load before extraction (default 0). Use for late-rendered DOM. | |
| waitForScroll | No | Scroll the page (or a container) before extraction to trigger lazy-loaded content. Distinct from top-level scrollSpeed. | |
| timeoutMs | No | Per-URL task timeout for the extension (default 60000). Not the MCP WebSocket wait; see bridgeTimeoutMs. | |
| bridgeTimeoutMs | No | MCP Server only: max ms to wait for one tool call on the WebSocket bridge (capped). Omitted → derived from URL count, maxPages, timeoutMs, scrapeInterval. Stripped before forwarding to the extension. | |
| includeHtml | No | If true, include document.documentElement.outerHTML in result meta for that URL. | |
| includeText | No | If true, include document.body.innerText in result meta for that URL. | |
| scrapeInterval | No | Ms between starting tasks in a multi-URL run. Omitted → extension default; extension enforces min 500ms. | |
| concurrency | No | Parallel tabs for multi-URL runs. Omitted → extension default; extension enforces max 3. | |
| scrollSpeed | No | Optional global scroll speed (px) for batch tuning—not the same as waitForScroll.scrollSpeed. | |
| filter | No | Post-extraction filter for image list. |