Skip to main content
Glama

scrape_images

Extract image metadata from web pages and filter by size, format, or keywords. Handles lazy-loaded galleries and batch URLs for asset audits and gallery scraping.

Instructions

Call ping first

New session or unsure the extension is online: ping first, then any scrape*. If EXTENSION_NOT_CONNECTED: ping again, then fix WebSocket using error.details.bridge, MCP stderr, and ~/.lionscraper/port, then retry.

lang (optional)

en-US | zh-CN: human-readable errors for this call; omitted → English; Chinese users pass lang: "zh-CN" on each call.

Do not substitute raw HTTP

When you need a real browser DOM, logged-in session cookies, JS-rendered content (SPAs), extension-side pagination or multi-URL scheduling, or structured field extraction, do not use WebFetch, curl, wget, or cookie-less IDE fetch tools instead of this server’s ping + scrape*. Only if the page is fully public and mostly static and the user clearly wants a trivial GET of raw HTML may you consider a plain HTTP client.

Purpose

List images (src, alt, size, format, …) with optional filters.

When to use

Asset audits, galleries, lazy-loaded pages (use delay / waitForScroll).

Returns

MultiUrlResult; success data is an array of image records (content script path; differs from email/phone).

Parameters

filter: minWidth, minHeight, format, keyword, limit. Top-level scrollSpeedwaitForScroll.scrollSpeed.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
urlYesOne http(s) URL or string[] (batch). Max 50 URLs per request (extension-enforced). Server forwards as-is.
langNoBCP 47 for human-readable errors this call: en-US | zh-CN. Omitted → English. Pass zh-CN when the user works in Chinese; the Server cannot infer chat language.
delayNoMilliseconds to wait after load before extraction (default 0). Use for late-rendered DOM.
waitForScrollNoScroll the page (or a container) before extraction to trigger lazy-loaded content. Distinct from top-level scrollSpeed.
timeoutMsNoPer-URL task timeout for the extension (default 60000). Not the MCP WebSocket wait; see bridgeTimeoutMs.
bridgeTimeoutMsNoMCP Server only: max ms to wait for one tool call on the WebSocket bridge (capped). Omitted → derived from URL count, maxPages, timeoutMs, scrapeInterval. Stripped before forwarding to the extension.
includeHtmlNoIf true, include document.documentElement.outerHTML in result meta for that URL.
includeTextNoIf true, include document.body.innerText in result meta for that URL.
scrapeIntervalNoMs between starting tasks in a multi-URL run. Omitted → extension default; extension enforces min 500ms.
concurrencyNoParallel tabs for multi-URL runs. Omitted → extension default; extension enforces max 3.
scrollSpeedNoOptional global scroll speed (px) for batch tuning—not the same as waitForScroll.scrollSpeed.
filterNoPost-extraction filter for image list.
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries substantial behavioral disclosure: it explains the WebSocket bridge infrastructure, server-side parameter stripping (bridgeTimeoutMs), extension-side constraints (max 50 URLs, min 500ms interval, max 3 concurrency), and content script execution differences from email/phone scrapers. It lacks only an explicit declaration of the read-only/non-destructive nature of the operation.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

While organized with clear markdown headers, the description front-loads operational warnings ('### Call ping first') before stating purpose, and the total length is substantial. Every section contains valuable information for a complex 12-parameter tool requiring browser extension infrastructure, though tighter integration could reduce the prerequisite verbosity.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the absence of output schema and annotations for a complex tool with nested objects and infrastructure dependencies, the description appropriately covers return types (MultiUrlResult with image record arrays), error handling patterns, retry logic, and BCP-47 language support. It provides sufficient context for successful invocation despite missing explicit safety classifications.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 100% schema coverage establishing a baseline of 3, the description adds significant semantic value by clarifying the distinction between top-level scrollSpeed and waitForScroll.scrollSpeed, explaining server-only parameters removed before forwarding, and providing usage context like 'drops small icons' for minWidth and 'late-rendered DOM' for delay.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool 'List images (src, alt, size, format, …) with optional filters,' specifying both the action and extracted fields. It effectively distinguishes from siblings like scrape_emails and scrape_article by targeting image assets specifically. However, this purpose statement is buried beneath extensive operational prerequisites, diminishing immediate clarity.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Excellent explicit guidance provided: mandates calling ping first with detailed error handling for EXTENSION_NOT_CONNECTED, specifies exact use cases ('Asset audits, galleries, lazy-loaded pages'), and clearly defines when NOT to use the tool versus alternatives ('Do not substitute raw HTTP...Only if the page is fully public and mostly static').

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/dowant/lionscraper-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server