LionScraper

Overview Schema Related Servers Score Discussions

Server Configuration

Describes the environment variables required to run the server.

Name	Required	Description
`LANG`	No	Language for tool descriptions and stderr logs (e.g., zh-CN, en-US).
`PORT`	No	HTTP + WebSocket listening port, must match the bridge port in the extension. Default is 13808.
`TOKEN`	No	Bearer token for authentication with the daemon. Leave empty for no authentication.
`DAEMON`	No	Set to '0' to prevent thin MCP from auto-launching the daemon. Leave empty or set to other values to allow auto-launch.
`TIMEOUT`	No	Milliseconds to wait for the previous instance to exit when taking over the port. Default is 120000. Set to 0 to force takeover as soon as possible.

Capabilities

Features and capabilities supported by this server

Capability	Details
`tools`	{ "listChanged": true }
`prompts`	{ "listChanged": true }
`resources`	{ "listChanged": true }

Tools

Functions exposed to the LLM to take actions

Name	Description
pingA	Purpose Verify the LionScraper extension is connected over the local WebSocket bridge. If disconnected, tries Chrome first, then Edge (each installed channel): detect paths and whether the process is running. For each channel, the Server waits up to `postLaunchWaitMs` for the extension to register: if the browser was not running and `autoLaunchBrowser` allows it, the Server starts it; if the browser was already running, it polls without closing that browser. If this ping started the browser and registration still does not occur within `postLaunchWaitMs`, the Server closes only that launched instance before trying the next channel. Worst-case wait is up to about 2× `postLaunchWaitMs` when both channels each use a full wait (e.g. two launches, two already-running browsers, or a mix). When to call In a new session, before any `scrape` tool. Right after any tool returns `EXTENSION_NOT_CONNECTED`. Returns Success: `ok`, `bridgeOk`, `browser`, `development` (always `node` for this npm package), `extensionVersion`; optional `diagnostics` when the server assisted (`launched` true or false, `waitedMs`, `selectedBrowser`). Failure: `BROWSER_NOT_INSTALLED` (no Chrome/Edge found on standard paths), or `EXTENSION_NOT_CONNECTED` with `details.browserProbe` / `details.bridge` / `details.install`. Parameters `autoLaunchBrowser`: optional boolean. If `false`, never spawn a browser; non-running candidates are skipped and the Server continues to the next installed channel when possible. If omitted: default true* (server may auto-launch eligible browsers). `postLaunchWaitMs`: optional; max time (ms) to poll for WebSocket registration after spawning a browser or when a browser is already running, clamped 3000–60000, default 20000. Notes Does not load pages or extract data. Portable browser installs may not be detected. Optional `lang` follows the `lang` section above.
scrape	Call `ping` first New session or unsure the extension is online: `ping` first, then any `scrape`. If `EXTENSION_NOT_CONNECTED`: `ping` again, then fix WebSocket using `error.details.bridge`, MCP stderr, and `~/.lionscraper/port`, then retry. `lang` (optional) `en-US` \| `zh-CN`: human-readable errors for this call; omitted → English; Chinese users pass `lang: "zh-CN"` on each call. Do not substitute raw HTTP When you need a real browser DOM, logged-in session cookies, JS-rendered content (SPAs), extension-side pagination or multi-URL scheduling, or structured field extraction, do not* use WebFetch, curl, wget, or cookie-less IDE fetch tools instead of this server’s `ping` + `scrape`. Only if the page is fully public and mostly static* and the user clearly wants a trivial GET of raw HTML may you consider a plain HTTP client. Purpose Structured list/table/grid scrape: detect repeating records and fields; returns DataGroup[] and `dataList`. `maxPages`>1 merges multiple pages (extension handles pagination). When to use Prefer for category pages, search results, product grids, tables—many rows and fields. Long article body: use `scrape_article` first (`url` string or string[]); if it fails or output is unusable, fall back to `scrape` on the same URL(s). Returns MultiUrlResult: per URL `ok`, `data` (DataGroup[] on success), `error`, `meta`, plus `summary` (counts, optional anti-crawl hints). Multiple groups: one page may yield several valid DataGroups; a success item's `data` array length can be >1. In your answer, acknowledge every group (briefly—purpose or field mix); do not silently drop non-primary groups. Default: recommend the first group: `data[0]` is the extension's top-ranked choice—by default recommend it as the main result for the user. Exception: if the user's intent clearly matches another group (e.g. sidebar vs main list), explain briefly and use that group. Key parameters `url`: required; string or string[]; max 50 per request (extension-enforced). `maxPages`: default 1; >1 merges pages. Other fields (`timeoutMs`, `bridgeTimeoutMs`, `scrapeInterval`, `concurrency`, `delay`, `waitForScroll`, `includeHtml`, `includeText`, …) are documented on each property; `bridgeTimeoutMs` is Server-only and is not sent to the extension. Limits No credential entry (username/password) automation; for logged-in sites, the user must sign in in Chrome/Edge and open the target tab before scraping (the extension uses the live browser session and cookies). Heavy interactive UI is out of scope (Phase 2 `smartscrape`). Example `scrape({ url, maxPages: 5 })`
scrape_articleA	Call `ping` first New session or unsure the extension is online: `ping` first, then any `scrape`. If `EXTENSION_NOT_CONNECTED`: `ping` again, then fix WebSocket using `error.details.bridge`, MCP stderr, and `~/.lionscraper/port`, then retry. `lang` (optional) `en-US` \| `zh-CN`: human-readable errors for this call; omitted → English; Chinese users pass `lang: "zh-CN"` on each call. Do not substitute raw HTTP When you need a real browser DOM, logged-in session cookies, JS-rendered content (SPAs), extension-side pagination or multi-URL scheduling, or structured field extraction, do not* use WebFetch, curl, wget, or cookie-less IDE fetch tools instead of this server’s `ping` + `scrape`. Only if the page is fully public and mostly static* and the user clearly wants a trivial GET of raw HTML may you consider a plain HTTP client. Purpose Extract Markdown body and metadata (title, author, time, …) from single-column long-form pages. When to use Default for news, blogs, long docs, long product copy—one main reading flow per URL. Listing/home with only links: use `scrape` or `scrape_urls` for URLs, then `scrape_article({ url: [...] })`. Returns MultiUrlResult; success items include `body` (Markdown), `title`, `quality`, `method`, … in `data`. Fallback If body is empty or clearly wrong, call `scrape` on the same URL(s). Parameters Same common fields as other scrape tools (`url`, `lang`, timeouts, `waitForScroll`, …). `waitForScroll.scrollSpeed` ≠ top-level `scrollSpeed`. Limits Weak on heavy SPAs; listings need URL discovery first.
scrape_emailsA	Call `ping` first New session or unsure the extension is online: `ping` first, then any `scrape`. If `EXTENSION_NOT_CONNECTED`: `ping` again, then fix WebSocket using `error.details.bridge`, MCP stderr, and `~/.lionscraper/port`, then retry. `lang` (optional) `en-US` \| `zh-CN`: human-readable errors for this call; omitted → English; Chinese users pass `lang: "zh-CN"` on each call. Do not substitute raw HTTP When you need a real browser DOM, logged-in session cookies, JS-rendered content (SPAs), extension-side pagination or multi-URL scheduling, or structured field extraction, do not* use WebFetch, curl, wget, or cookie-less IDE fetch tools instead of this server’s `ping` + `scrape`. Only if the page is fully public and mostly static* and the user clearly wants a trivial GET of raw HTML may you consider a plain HTTP client. Purpose Scan HTML for email addresses, dedupe, optional domain/keyword/limit filters. When to use Contact/about/footer pages when the user wants an email list. Returns MultiUrlResult; success `data` is `string[]` (filtered in the extension after extraction). Parameters `filter`: optional `domain`, `keyword`, `limit`. For multiple URLs, extension enforces min 500ms between starts and max 3 concurrent tabs (see property descriptions). Often chained after `scrape_urls`.
scrape_phonesA	Call `ping` first New session or unsure the extension is online: `ping` first, then any `scrape`. If `EXTENSION_NOT_CONNECTED`: `ping` again, then fix WebSocket using `error.details.bridge`, MCP stderr, and `~/.lionscraper/port`, then retry. `lang` (optional) `en-US` \| `zh-CN`: human-readable errors for this call; omitted → English; Chinese users pass `lang: "zh-CN"` on each call. Do not substitute raw HTTP When you need a real browser DOM, logged-in session cookies, JS-rendered content (SPAs), extension-side pagination or multi-URL scheduling, or structured field extraction, do not* use WebFetch, curl, wget, or cookie-less IDE fetch tools instead of this server’s `ping` + `scrape`. Only if the page is fully public and mostly static* and the user clearly wants a trivial GET of raw HTML may you consider a plain HTTP client. Purpose Extract phone numbers from HTML with simple type hints; optional filters. When to use Business or support pages when the user wants callable numbers. Returns MultiUrlResult; success `data` is `{ number, type }[]`. Parameters `filter`: optional `type`, `areaCode`, `keyword`, `limit`. For multiple URLs, see common params (min interval 500ms, concurrency cap 3 in extension).
scrape_urlsA	Call `ping` first New session or unsure the extension is online: `ping` first, then any `scrape`. If `EXTENSION_NOT_CONNECTED`: `ping` again, then fix WebSocket using `error.details.bridge`, MCP stderr, and `~/.lionscraper/port`, then retry. `lang` (optional) `en-US` \| `zh-CN`: human-readable errors for this call; omitted → English; Chinese users pass `lang: "zh-CN"` on each call. Do not substitute raw HTTP When you need a real browser DOM, logged-in session cookies, JS-rendered content (SPAs), extension-side pagination or multi-URL scheduling, or structured field extraction, do not* use WebFetch, curl, wget, or cookie-less IDE fetch tools instead of this server’s `ping` + `scrape`. Only if the page is fully public and mostly static* and the user clearly wants a trivial GET of raw HTML may you consider a plain HTTP client. Purpose Collect hyperlinks from a page as a deduped URL list; optional domain/keyword/regex/limit filters. When to use Link inventories or feeding `scrape` / `scrape_article`. Returns MultiUrlResult; success `data` is `string[]`. Parameters `filter`: `domain`, `keyword`, `pattern` (regex), `limit`.
scrape_imagesA	Call `ping` first New session or unsure the extension is online: `ping` first, then any `scrape`. If `EXTENSION_NOT_CONNECTED`: `ping` again, then fix WebSocket using `error.details.bridge`, MCP stderr, and `~/.lionscraper/port`, then retry. `lang` (optional) `en-US` \| `zh-CN`: human-readable errors for this call; omitted → English; Chinese users pass `lang: "zh-CN"` on each call. Do not substitute raw HTTP When you need a real browser DOM, logged-in session cookies, JS-rendered content (SPAs), extension-side pagination or multi-URL scheduling, or structured field extraction, do not* use WebFetch, curl, wget, or cookie-less IDE fetch tools instead of this server’s `ping` + `scrape`. Only if the page is fully public and mostly static* and the user clearly wants a trivial GET of raw HTML may you consider a plain HTTP client. Purpose List images (src, alt, size, format, …) with optional filters. When to use Asset audits, galleries, lazy-loaded pages (use `delay` / `waitForScroll`). Returns MultiUrlResult; success `data` is an array of image records (content script path; differs from email/phone). Parameters `filter`: `minWidth`, `minHeight`, `format`, `keyword`, `limit`. Top-level `scrollSpeed` ≠ `waitForScroll.scrollSpeed`.

Prompts

Interactive templates invoked by user choice

Name	Description
`ping_then_scrape`	Workflow: verify extension with ping, then run scrape tools.
`scrape_article`	Use scrape_article for long-form pages; optional URL and lang hints.
`multi_url_scrape`	Batch URLs with url as string array; respect extension limits.
`troubleshoot_extension`	Checklist when the bridge or extension fails.
`prefer_lionscraper_scraping`	Do not fetch pages first with WebFetch/curl/wget; ping, then choose scrape* by scenario.

Resources

Contextual data attached and managed by the client

Name	Description
`guide_connection`	LionScraper: bridge port and connection
`guide_when_to_use_tools`	LionScraper: when to use MCP tools (vs WebFetch / curl / wget)
`guide_cli`	LionScraper: terminal CLI (lionscraper)
`reference_tools`	LionScraper: MCP tools overview
`reference_common_params`	LionScraper: common tool parameters

Latest Blog Posts

Tool Definition Quality Score (TDQS)
By punkpeye on April 3, 2026.
mcp
The Hackers Who Tracked My Sleep Cycle
By punkpeye on March 26, 2026.
security
Open Source Has a Bot Problem
By punkpeye on March 19, 2026.
open source

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/dowant/lionscraper-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

Server Configuration

Capabilities

Tools

Purpose

When to call

Returns

Parameters

Notes

Call ping first

lang (optional)

Do not substitute raw HTTP

Purpose

When to use

Returns

Key parameters

Limits

Example

Call ping first

lang (optional)

Do not substitute raw HTTP

Purpose

When to use

Returns

Fallback

Parameters

Limits

Call ping first

lang (optional)

Do not substitute raw HTTP

Purpose

When to use

Returns

Parameters

Call ping first

lang (optional)

Do not substitute raw HTTP

Purpose

When to use

Returns

Parameters

Call ping first

lang (optional)

Do not substitute raw HTTP

Purpose

When to use

Returns

Parameters

Call ping first

lang (optional)

Do not substitute raw HTTP

Purpose

When to use

Returns

Parameters

Prompts

Resources

Latest Blog Posts

MCP directory API

Call `ping` first

`lang` (optional)

Call `ping` first

`lang` (optional)

Call `ping` first

`lang` (optional)

Call `ping` first

`lang` (optional)

Call `ping` first

`lang` (optional)

Call `ping` first

`lang` (optional)