LionScraper
LionScraper is a bridge server connecting AI apps, CLI tools, and HTTP clients to a browser extension for web scraping via a local WebSocket connection. It leverages live browser sessions (Chrome/Edge) to handle JavaScript-rendered content, logged-in sessions, and cookies.
Tools:
ping– Verify the browser extension is connected; can auto-detect and launch Chrome or Edge if needed.scrape– Extract structured lists, tables, and grids from web pages with pagination support.scrape_article– Extract the main article body as Markdown plus metadata (title, author, publish time) from long-form content.scrape_emails– Scan pages for email addresses with deduplication and filtering by domain, keyword, or result limit.scrape_phones– Extract phone numbers with optional filters by type (mobile/landline), area code, or keyword.scrape_urls– Collect and deduplicate hyperlinks with filtering by domain, keyword, or regex pattern.scrape_images– List images (src, alt, size, format) with filters for minimum dimensions, format, or keyword.
Common features across all scrape tools:
Batch processing of up to 50 URLs per request
Scroll automation (
waitForScroll) for lazy-loaded contentConfigurable timeouts, concurrency, and scrape intervals
Language support for error messages (
en-USorzh-CN)
Provides access to the LionScraper browser extension for web scraping capabilities, enabling AI agents to collect lists, articles, links, images, emails, and phone numbers from web pages through the Chrome/Edge extension bridge.
Mentioned in the context of Unix installs and deployment considerations, though the primary integration is with the browser extension rather than Docker-specific functionality.
Hosts the project repository and is referenced in MCP registry entries (io.github.dowant/lionscraper-node and io.github.dowant/lionscraper-python) for publishing and distribution.
Provides the Node.js implementation of the LionScraper bridge (npm package), enabling MCP, CLI, and HTTP API access to the browser extension's web scraping capabilities.
Hosts the Node.js package (lionscraper) for installation and distribution, enabling MCP server deployment and CLI access to the browser extension's web scraping tools.
Hosts the Python package (lionscraper) for installation and distribution, enabling MCP server deployment and CLI access to the browser extension's web scraping tools.
Provides the Python implementation of the LionScraper bridge (PyPI package), enabling MCP, CLI, and HTTP API access to the browser extension's web scraping capabilities.
LionScraper MCP + CLI + HTTP API bridge
Website: lionscraper.com
npm: package
lionscraperPyPI: project
lionscraper
What is this?
LionScraper is a browser extension that can collect lists, articles, links, images, and more from web pages. This repository provides the companion bridge between your tools and that extension in three ways:
MCP (
lionscraper-mcp): connect an AI app (e.g. Cursor) so the model can call scraping tools over stdio.CLI (
lionscraper): run daemon, scrape, ping, and more from a terminal on the same local HTTP/WebSocket port as the extension.HTTP API: when the daemon is running, call the same capabilities over loopback JSON HTTP (e.g.
/v1/...) from scripts or any HTTP client—no MCP or CLI front-end required.
The real scraping logic runs in the extension; these packages connect and forward.
Before you start
Browser: Chrome or Edge (follow what the extension supports).
LionScraper extension: install and enable from the store.
Chrome: Chrome Web Store — LionScraper
Microsoft Edge: Edge Add-ons — LionScraper
Runtime (pick one or both implementations):
For MCP: an AI app that supports MCP (e.g. Cursor, Trae).
For the HTTP API: same browser, extension, and daemon as the CLI; see the package READMEs for paths and examples.
HTTP fallback without Chrome/Edge: If neither browser is detected under standard paths and the extension is not connected, MCP still starts; ping succeeds with http_fetch mode and scrape* use a minimal server-side HTTP GET (no JS execution). If a browser is installed but the extension is not connected, you still get the extension connection flow. The Node auto-spawn path fixes Unix installs where lionscraper.js was resolved without a leading / (e.g. Glama/Docker). The Python package uses aiohttp for outbound HTTP/WebSocket to the daemon.
Two implementations
Node.js (npm) | Python (pip) | |
Registry |
|
|
Docs (EN) | ||
Docs (ZH) |
Install one or both; they are separate packages with the same CLI command names.
Install (npm)
Published as lionscraper on npm.
npm install -g lionscraperWithout a global install, MCP can use npx; see the npx JSON examples under Add MCP in your AI app.
Install (pip)
Published as lionscraper on PyPI.
pip install -U lionscraperA virtual environment is recommended, or pip install -U --user lionscraper if you prefer not to install into the system interpreter.
Commands (both packages)
Command | Role |
| Thin MCP server (stdio) for AI apps |
| CLI: |
After pip install -U lionscraper, if lionscraper-mcp is not on your PATH, use python -m lionscraper with no extra arguments for MCP stdio (see packages/python/README.md).
PORT (default 13808) must match the extension bridge port in all modes.
CLI quick start
lionscraper daemon
lionscraper ping
lionscraper scrape -u https://www.example.comFull flags, multiple URLs, pagination, and HTTP API details: packages/node/README.md / packages/python/README.md.
Add MCP in your AI app
Examples assume lionscraper-mcp is on your PATH (from npm or pip). In MCP JSON, every env value is a string.
Minimal config (PORT defaults to 13808; must match the extension bridge port):
{
"mcpServers": {
"lionscraper": {
"command": "lionscraper-mcp"
}
}
}Full env example (omit keys you do not need):
{
"mcpServers": {
"lionscraper": {
"command": "lionscraper-mcp",
"env": {
"PORT": "13808",
"TIMEOUT": "120000",
"LANG": "en-US",
"TOKEN": "",
"DAEMON": ""
}
}
}
}npx (no global install) — requires Node.js; the first run may download the package. The npm package name is lionscraper; the executable is lionscraper-mcp. Use command npx and pass lionscraper then lionscraper-mcp in args (after -y).
Minimal config (npx):
{
"mcpServers": {
"lionscraper": {
"command": "npx",
"args": ["-y", "lionscraper", "lionscraper-mcp"]
}
}
}Full env example (npx):
{
"mcpServers": {
"lionscraper": {
"command": "npx",
"args": ["-y", "lionscraper", "lionscraper-mcp"],
"env": {
"PORT": "13808",
"TIMEOUT": "120000",
"LANG": "en-US",
"TOKEN": "",
"DAEMON": ""
}
}
}
}To pin a version, use e.g. "lionscraper@1.0.1" in place of "lionscraper" inside args.
PORT: HTTP + WebSocket listen port; default 13808; must match the extension bridge port.TIMEOUT: ms to wait for a previous instance to release the port; default 120000;0forces takeover quickly.LANG: tool descriptions and stderr language (en-US,zh-CN, or POSIX forms).TOKEN: Bearer token shared with the daemon; empty means no auth.DAEMON: only0disables auto-startinglionscraper daemonfrom thin MCP.
Restart MCP or the host app after changing config.
Python: MCP via python -m
{
"mcpServers": {
"lionscraper": {
"command": "python",
"args": ["-m", "lionscraper"]
}
}
}Use the same python you used to install the package (or python3 on some systems).
Match the port in the browser extension
Open LionScraper settings / options.
Set bridge port to the same value as
PORT(e.g.13808).If needed, use Reconnect, reload the extension, or restart the browser.
Day-to-day use
Keep the extension enabled and target pages open as required.
Ask in natural language (e.g. check connection, scrape lists / article / emails / phones / links / images).
If you see “not connected” or timeouts, retry a connection check and confirm PORT matches.
FAQ
Extension not connected or scrape fails?
Is the extension enabled?
Does PORT in the AI app match the extension bridge port exactly?
One bridge per machine is usually enough; duplicate MCP configs can conflict.
Seeing MCP tools in the client means everything works?
Not necessarily. Tools only prove AI → bridge; the extension must also register on the same port.
MCP Registry and directories
Official MCP Registry entries (both use server.json):
Path | Registry name | Package |
| npm: lionscraper ( | |
| PyPI: lionscraper ( |
Publish outline (install the official CLI, see Quickstart):
Publish npm / PyPI at the version in each
server.json.In
packages/node:mcp-publisher login github, thenmcp-publisher publish.In
packages/python:mcp-publisher publish(login reused).
Third-party listings (e.g. Glama) have their own rules; Smithery targets public HTTPS/streaming setups rather than local stdio + npm/pip by default.
Third-party directory (Glama)
This project is listed on Glama (e.g. LionScraper on Glama). If the page shows cannot be installed or license not found, typical fixes are: add a root LICENSE (this repo includes LICENSE), add glama.json with maintainer GitHub usernames for org-owned repos (glama.json—edit maintainers if claim fails), claim the server on Glama, and optionally complete Glama’s Docker / release flow if you need their install and security/quality checks—official install remains npm install -g lionscraper and pip install -U lionscraper. See also the score / checklist page.
License
MIT (same as the npm and PyPI packages).
Latest Blog Posts
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/dowant/lionscraper-mcp'
If you have feedback or need assistance with the MCP directory API, please join our Discord server