Skip to main content
Glama

Server Configuration

Describes the environment variables required to run the server.

NameRequiredDescriptionDefault
LANGNoLanguage for tool descriptions and stderr logs (e.g., zh-CN, en-US).
PORTNoHTTP + WebSocket listening port, must match the bridge port in the extension. Default is 13808.
TOKENNoBearer token for authentication with the daemon. Leave empty for no authentication.
DAEMONNoSet to '0' to prevent thin MCP from auto-launching the daemon. Leave empty or set to other values to allow auto-launch.
TIMEOUTNoMilliseconds to wait for the previous instance to exit when taking over the port. Default is 120000. Set to 0 to force takeover as soon as possible.

Capabilities

Features and capabilities supported by this server

CapabilityDetails
tools
{
  "listChanged": true
}
prompts
{
  "listChanged": true
}
resources
{
  "listChanged": true
}

Tools

Functions exposed to the LLM to take actions

NameDescription
pingA

Purpose

Verify the LionScraper extension is connected over the local WebSocket bridge. If disconnected, tries Chrome first, then Edge (each installed channel): detect paths and whether the process is running. For each channel, the Server waits up to postLaunchWaitMs for the extension to register: if the browser was not running and autoLaunchBrowser allows it, the Server starts it; if the browser was already running, it polls without closing that browser. If this ping started the browser and registration still does not occur within postLaunchWaitMs, the Server closes only that launched instance before trying the next channel. Worst-case wait is up to about postLaunchWaitMs when both channels each use a full wait (e.g. two launches, two already-running browsers, or a mix).

When to call

  1. In a new session, before any scrape* tool.

  2. Right after any tool returns EXTENSION_NOT_CONNECTED.

Returns

  • Success: ok, bridgeOk, browser, development (always node for this npm package), extensionVersion; optional diagnostics when the server assisted (launched true or false, waitedMs, selectedBrowser).

  • Failure: BROWSER_NOT_INSTALLED (no Chrome/Edge found on standard paths), or EXTENSION_NOT_CONNECTED with details.browserProbe / details.bridge / details.install.

Parameters

  • autoLaunchBrowser: optional boolean. If false, never spawn a browser; non-running candidates are skipped and the Server continues to the next installed channel when possible. If omitted: default true (server may auto-launch eligible browsers).

  • postLaunchWaitMs: optional; max time (ms) to poll for WebSocket registration after spawning a browser or when a browser is already running, clamped 3000–60000, default 20000.

Notes

Does not load pages or extract data. Portable browser installs may not be detected. Optional lang follows the lang section above.

scrape

Call ping first

New session or unsure the extension is online: ping first, then any scrape*. If EXTENSION_NOT_CONNECTED: ping again, then fix WebSocket using error.details.bridge, MCP stderr, and ~/.lionscraper/port, then retry.

lang (optional)

en-US | zh-CN: human-readable errors for this call; omitted → English; Chinese users pass lang: "zh-CN" on each call.

Do not substitute raw HTTP

When you need a real browser DOM, logged-in session cookies, JS-rendered content (SPAs), extension-side pagination or multi-URL scheduling, or structured field extraction, do not use WebFetch, curl, wget, or cookie-less IDE fetch tools instead of this server’s ping + scrape*. Only if the page is fully public and mostly static and the user clearly wants a trivial GET of raw HTML may you consider a plain HTTP client.

Purpose

Structured list/table/grid scrape: detect repeating records and fields; returns DataGroup[] and dataList. maxPages>1 merges multiple pages (extension handles pagination).

When to use

  • Prefer for category pages, search results, product grids, tables—many rows and fields.

  • Long article body: use scrape_article first (url string or string[]); if it fails or output is unusable, fall back to scrape on the same URL(s).

Returns

MultiUrlResult: per URL ok, data (DataGroup[] on success), error, meta, plus summary (counts, optional anti-crawl hints).

  • Multiple groups: one page may yield several valid DataGroups; a success item's data array length can be >1. In your answer, acknowledge every group (briefly—purpose or field mix); do not silently drop non-primary groups.

  • Default: recommend the first group: data[0] is the extension's top-ranked choice—by default recommend it as the main result for the user.

  • Exception: if the user's intent clearly matches another group (e.g. sidebar vs main list), explain briefly and use that group.

Key parameters

  • url: required; string or string[]; max 50 per request (extension-enforced).

  • maxPages: default 1; >1 merges pages.

  • Other fields (timeoutMs, bridgeTimeoutMs, scrapeInterval, concurrency, delay, waitForScroll, includeHtml, includeText, …) are documented on each property; bridgeTimeoutMs is Server-only and is not sent to the extension.

Limits

No credential entry (username/password) automation; for logged-in sites, the user must sign in in Chrome/Edge and open the target tab before scraping (the extension uses the live browser session and cookies). Heavy interactive UI is out of scope (Phase 2 smartscrape).

Example

scrape({ url, maxPages: 5 })

scrape_articleA

Call ping first

New session or unsure the extension is online: ping first, then any scrape*. If EXTENSION_NOT_CONNECTED: ping again, then fix WebSocket using error.details.bridge, MCP stderr, and ~/.lionscraper/port, then retry.

lang (optional)

en-US | zh-CN: human-readable errors for this call; omitted → English; Chinese users pass lang: "zh-CN" on each call.

Do not substitute raw HTTP

When you need a real browser DOM, logged-in session cookies, JS-rendered content (SPAs), extension-side pagination or multi-URL scheduling, or structured field extraction, do not use WebFetch, curl, wget, or cookie-less IDE fetch tools instead of this server’s ping + scrape*. Only if the page is fully public and mostly static and the user clearly wants a trivial GET of raw HTML may you consider a plain HTTP client.

Purpose

Extract Markdown body and metadata (title, author, time, …) from single-column long-form pages.

When to use

  • Default for news, blogs, long docs, long product copy—one main reading flow per URL.

  • Listing/home with only links: use scrape or scrape_urls for URLs, then scrape_article({ url: [...] }).

Returns

MultiUrlResult; success items include body (Markdown), title, quality, method, … in data.

Fallback

If body is empty or clearly wrong, call scrape on the same URL(s).

Parameters

Same common fields as other scrape tools (url, lang, timeouts, waitForScroll, …). waitForScroll.scrollSpeed ≠ top-level scrollSpeed.

Limits

Weak on heavy SPAs; listings need URL discovery first.

scrape_emailsA

Call ping first

New session or unsure the extension is online: ping first, then any scrape*. If EXTENSION_NOT_CONNECTED: ping again, then fix WebSocket using error.details.bridge, MCP stderr, and ~/.lionscraper/port, then retry.

lang (optional)

en-US | zh-CN: human-readable errors for this call; omitted → English; Chinese users pass lang: "zh-CN" on each call.

Do not substitute raw HTTP

When you need a real browser DOM, logged-in session cookies, JS-rendered content (SPAs), extension-side pagination or multi-URL scheduling, or structured field extraction, do not use WebFetch, curl, wget, or cookie-less IDE fetch tools instead of this server’s ping + scrape*. Only if the page is fully public and mostly static and the user clearly wants a trivial GET of raw HTML may you consider a plain HTTP client.

Purpose

Scan HTML for email addresses, dedupe, optional domain/keyword/limit filters.

When to use

Contact/about/footer pages when the user wants an email list.

Returns

MultiUrlResult; success data is string[] (filtered in the extension after extraction).

Parameters

filter: optional domain, keyword, limit. For multiple URLs, extension enforces min 500ms between starts and max 3 concurrent tabs (see property descriptions).

Often chained after scrape_urls.

scrape_phonesA

Call ping first

New session or unsure the extension is online: ping first, then any scrape*. If EXTENSION_NOT_CONNECTED: ping again, then fix WebSocket using error.details.bridge, MCP stderr, and ~/.lionscraper/port, then retry.

lang (optional)

en-US | zh-CN: human-readable errors for this call; omitted → English; Chinese users pass lang: "zh-CN" on each call.

Do not substitute raw HTTP

When you need a real browser DOM, logged-in session cookies, JS-rendered content (SPAs), extension-side pagination or multi-URL scheduling, or structured field extraction, do not use WebFetch, curl, wget, or cookie-less IDE fetch tools instead of this server’s ping + scrape*. Only if the page is fully public and mostly static and the user clearly wants a trivial GET of raw HTML may you consider a plain HTTP client.

Purpose

Extract phone numbers from HTML with simple type hints; optional filters.

When to use

Business or support pages when the user wants callable numbers.

Returns

MultiUrlResult; success data is { number, type }[].

Parameters

filter: optional type, areaCode, keyword, limit. For multiple URLs, see common params (min interval 500ms, concurrency cap 3 in extension).

scrape_urlsA

Call ping first

New session or unsure the extension is online: ping first, then any scrape*. If EXTENSION_NOT_CONNECTED: ping again, then fix WebSocket using error.details.bridge, MCP stderr, and ~/.lionscraper/port, then retry.

lang (optional)

en-US | zh-CN: human-readable errors for this call; omitted → English; Chinese users pass lang: "zh-CN" on each call.

Do not substitute raw HTTP

When you need a real browser DOM, logged-in session cookies, JS-rendered content (SPAs), extension-side pagination or multi-URL scheduling, or structured field extraction, do not use WebFetch, curl, wget, or cookie-less IDE fetch tools instead of this server’s ping + scrape*. Only if the page is fully public and mostly static and the user clearly wants a trivial GET of raw HTML may you consider a plain HTTP client.

Purpose

Collect hyperlinks from a page as a deduped URL list; optional domain/keyword/regex/limit filters.

When to use

Link inventories or feeding scrape / scrape_article.

Returns

MultiUrlResult; success data is string[].

Parameters

filter: domain, keyword, pattern (regex), limit.

scrape_imagesA

Call ping first

New session or unsure the extension is online: ping first, then any scrape*. If EXTENSION_NOT_CONNECTED: ping again, then fix WebSocket using error.details.bridge, MCP stderr, and ~/.lionscraper/port, then retry.

lang (optional)

en-US | zh-CN: human-readable errors for this call; omitted → English; Chinese users pass lang: "zh-CN" on each call.

Do not substitute raw HTTP

When you need a real browser DOM, logged-in session cookies, JS-rendered content (SPAs), extension-side pagination or multi-URL scheduling, or structured field extraction, do not use WebFetch, curl, wget, or cookie-less IDE fetch tools instead of this server’s ping + scrape*. Only if the page is fully public and mostly static and the user clearly wants a trivial GET of raw HTML may you consider a plain HTTP client.

Purpose

List images (src, alt, size, format, …) with optional filters.

When to use

Asset audits, galleries, lazy-loaded pages (use delay / waitForScroll).

Returns

MultiUrlResult; success data is an array of image records (content script path; differs from email/phone).

Parameters

filter: minWidth, minHeight, format, keyword, limit. Top-level scrollSpeedwaitForScroll.scrollSpeed.

Prompts

Interactive templates invoked by user choice

NameDescription
ping_then_scrapeWorkflow: verify extension with ping, then run scrape tools.
scrape_articleUse scrape_article for long-form pages; optional URL and lang hints.
multi_url_scrapeBatch URLs with url as string array; respect extension limits.
troubleshoot_extensionChecklist when the bridge or extension fails.
prefer_lionscraper_scrapingDo not fetch pages first with WebFetch/curl/wget; ping, then choose scrape* by scenario.

Resources

Contextual data attached and managed by the client

NameDescription
guide_connectionLionScraper: bridge port and connection
guide_when_to_use_toolsLionScraper: when to use MCP tools (vs WebFetch / curl / wget)
guide_cliLionScraper: terminal CLI (lionscraper)
reference_toolsLionScraper: MCP tools overview
reference_common_paramsLionScraper: common tool parameters

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/dowant/lionscraper-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server