Skip to main content
Glama

scrape_emails

:

Instructions

Call ping first

New session or unsure the extension is online: ping first, then any scrape*. If EXTENSION_NOT_CONNECTED: ping again, then fix WebSocket using error.details.bridge, MCP stderr, and ~/.lionscraper/port, then retry.

lang (optional)

en-US | zh-CN: human-readable errors for this call; omitted → English; Chinese users pass lang: "zh-CN" on each call.

Do not substitute raw HTTP

When you need a real browser DOM, logged-in session cookies, JS-rendered content (SPAs), extension-side pagination or multi-URL scheduling, or structured field extraction, do not use WebFetch, curl, wget, or cookie-less IDE fetch tools instead of this server’s ping + scrape*. Only if the page is fully public and mostly static and the user clearly wants a trivial GET of raw HTML may you consider a plain HTTP client.

Purpose

Scan HTML for email addresses, dedupe, optional domain/keyword/limit filters.

When to use

Contact/about/footer pages when the user wants an email list.

Returns

MultiUrlResult; success data is string[] (filtered in the extension after extraction).

Parameters

filter: optional domain, keyword, limit. For multiple URLs, extension enforces min 500ms between starts and max 3 concurrent tabs (see property descriptions).

Often chained after scrape_urls.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
urlYesOne http(s) URL or string[] (batch). Max 50 URLs per request (extension-enforced). Server forwards as-is.
langNoBCP 47 for human-readable errors this call: en-US | zh-CN. Omitted → English. Pass zh-CN when the user works in Chinese; the Server cannot infer chat language.
delayNoMilliseconds to wait after load before extraction (default 0). Use for late-rendered DOM.
waitForScrollNoScroll the page (or a container) before extraction to trigger lazy-loaded content. Distinct from top-level scrollSpeed.
timeoutMsNoPer-URL task timeout for the extension (default 60000). Not the MCP WebSocket wait; see bridgeTimeoutMs.
bridgeTimeoutMsNoMCP Server only: max ms to wait for one tool call on the WebSocket bridge (capped). Omitted → derived from URL count, maxPages, timeoutMs, scrapeInterval. Stripped before forwarding to the extension.
includeHtmlNoIf true, include document.documentElement.outerHTML in result meta for that URL.
includeTextNoIf true, include document.body.innerText in result meta for that URL.
scrapeIntervalNoMs between starting tasks in a multi-URL run. Omitted → extension default; extension enforces min 500ms.
concurrencyNoParallel tabs for multi-URL runs. Omitted → extension default; extension enforces max 3.
scrollSpeedNoOptional global scroll speed (px) for batch tuning—not the same as waitForScroll.scrollSpeed.
filterNoPost-extraction filter for email list.
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries full behavioral disclosure burden and succeeds well. It reveals extension-side rate limiting (500ms minimum, max 3 concurrent tabs), WebSocket bridge mechanics ('bridgeTimeoutMs', 'EXTENSION_NOT_CONNECTED' handling), processing pipeline (deduplication and filtering happens in-extension), and DOM requirements ('logged-in session cookies, JS-rendered content'). Minor gap: doesn't explicitly state read-only nature, though implied by 'scan'.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Uses clear markdown headers (### Purpose, ### When to use, etc.) creating scannable structure. Front-loaded with critical operational prerequisites (ping, lang). While lengthy due to 12 complex parameters and WebSocket architecture explanations, every section earns its place by providing necessary operational or behavioral context not found elsewhere.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given high complexity (12 params, nested objects, WebSocket extension) and lack of output schema, the description adequately covers return values ('MultiUrlResult; success data is string[]'), error handling patterns (ping/retry logic), and parameter interactions. Could marginally improve by enumerating specific error cases in the return structure, but sufficiently complete for agent operation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, establishing baseline 3. The description adds usage context beyond schema: specifically when to use 'zh-CN' ('Chinese users pass lang: zh-CN'), explains the relationship between concurrency/scrapeInterval and extension limits ('extension enforces...'), and clarifies that bridgeTimeoutMs is server-side only ('Stripped before forwarding'). This lifts it above baseline schema repetition.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description explicitly states 'Scan HTML for email addresses, dedupe, optional domain/keyword/limit filters,' providing a specific verb (scan), resource (HTML/email addresses), and scope (deduplication/filters). It clearly distinguishes from siblings like scrape_article or scrape_images by specifying the email extraction use case.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Contains explicit 'When to use' section ('Contact/about/footer pages when the user wants an email list') and strong negative guidance against alternatives ('Do not substitute raw HTTP... do not use WebFetch, curl, wget'). It also specifies prerequisite workflow ('Call ping first') and chaining patterns ('Often chained after scrape_urls'), giving complete decision-making context.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/dowant/lionscraper-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server