de en es ja ko ru zh

LionScraper

by dowant

Overview Schema Related Servers Score Discussions

TypeScript

Local

Server Quality Checklist

Profile completionA complete profile improves this server's visibility in search results.

Disambiguation5/5
Each tool targets a distinct extraction domain (articles, emails, images, phones, URLs) or connection status (ping). No overlap exists between the specialized scrapers, and the agent can easily select based on desired data type.
Naming Consistency5/5
All extraction tools follow the consistent snake_case pattern scrape_{noun} (scrape_article, scrape_emails, scrape_images, scrape_phones, scrape_urls). 'ping' appropriately deviates as a connectivity utility rather than extraction tool.
Tool Count5/5
Six tools is ideal for this focused scope: one connection manager plus five specific extraction targets. The count avoids bloat while covering the essential extraction workflows for a browser-based scraper.
Completeness4/5
Covers the core extraction lifecycle: connectivity check, URL discovery, and specific data harvesting (articles, emails, images, phones). Documentation references a generic 'scrape' fallback tool that isn't listed, indicating either a minor gap or outdated docs.
Average 4.5/5 across 6 of 6 tools scored.
See the tool scores section below for per-tool breakdowns.
This repository includes a README.md file.
This repository includes a LICENSE file.
Latest release: v1.0.6
No tool usage detected in the last 30 days. Usage tracking helps demonstrate server value.
Tip: use the "Try in Browser" feature on the server page to seed initial usage.
This repository includes a glama.json configuration file.
View server inspector
This server provides 7 tools. View schema
No known security issues or vulnerabilities reported.
Report a security issue
This server has been verified by its author.
Add related servers to improve discoverability.

Tool Scores

Behavior4/5
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description carries substantial behavioral disclosure: it explains the WebSocket bridge infrastructure, server-side parameter stripping (bridgeTimeoutMs), extension-side constraints (max 50 URLs, min 500ms interval, max 3 concurrency), and content script execution differences from email/phone scrapers. It lacks only an explicit declaration of the read-only/non-destructive nature of the operation.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Conciseness3/5
Is the description appropriately sized, front-loaded, and free of redundancy?
While organized with clear markdown headers, the description front-loads operational warnings ('### Call ping first') before stating purpose, and the total length is substantial. Every section contains valuable information for a complex 12-parameter tool requiring browser extension infrastructure, though tighter integration could reduce the prerequisite verbosity.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Completeness4/5
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the absence of output schema and annotations for a complex tool with nested objects and infrastructure dependencies, the description appropriately covers return types (MultiUrlResult with image record arrays), error handling patterns, retry logic, and BCP-47 language support. It provides sufficient context for successful invocation despite missing explicit safety classifications.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Parameters4/5
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
With 100% schema coverage establishing a baseline of 3, the description adds significant semantic value by clarifying the distinction between top-level scrollSpeed and waitForScroll.scrollSpeed, explaining server-only parameters removed before forwarding, and providing usage context like 'drops small icons' for minWidth and 'late-rendered DOM' for delay.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Purpose4/5
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool 'List images (src, alt, size, format, …) with optional filters,' specifying both the action and extracted fields. It effectively distinguishes from siblings like scrape_emails and scrape_article by targeting image assets specifically. However, this purpose statement is buried beneath extensive operational prerequisites, diminishing immediate clarity.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Usage Guidelines5/5
Does the description explain when to use this tool, when not to, or what alternatives exist?
Excellent explicit guidance provided: mandates calling ping first with detailed error handling for EXTENSION_NOT_CONNECTED, specifies exact use cases ('Asset audits, galleries, lazy-loaded pages'), and clearly defines when NOT to use the tool versus alternatives ('Do not substitute raw HTTP...Only if the page is fully public and mostly static').
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
Behavior4/5
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
No annotations provided, so description carries full burden. Discloses return type (MultiUrlResult with string[] data), extension-enforced limits (50 URLs, min 500ms interval), connection error handling (EXTENSION_NOT_CONNECTED), and WebSocket bridge behavior. Does not explicitly state read-only nature, but 'Collect' implies non-destructive.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Conciseness4/5
Is the description appropriately sized, front-loaded, and free of redundancy?
Well-structured with markdown headers front-loading operational prerequisites (ping). Lengthy but every section serves agent decision-making: connection troubleshooting, localization, substitution warnings, and purpose. Could be slightly tighter but earns its length given complexity.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Completeness5/5
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Comprehensive for a 12-parameter tool with nested objects and no output schema. Covers prerequisites, error handling (EXTENSION_NOT_CONNECTED), return format (MultiUrlResult), usage context, and consumer relationship to sibling tools. Fully compensates for missing annotations and output schema.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Parameters3/5
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 100% with detailed descriptions for all 12 parameters including nested objects. Description adds minimal beyond schema: specific guidance for 'lang' (Chinese users should pass 'zh-CN') and brief filter field recap. Baseline 3 appropriate since schema does heavy lifting.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Purpose5/5
Does the description clearly state what the tool does and how it differs from similar tools?
Clear specific purpose: 'Collect hyperlinks from a page as a deduped URL list'. Explicitly distinguishes from siblings by contrasting with raw HTTP tools (curl/wget) and positioning as input for 'scrape' / 'scrape_article'. Verb+resource+filters are specific.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Usage Guidelines5/5
Does the description explain when to use this tool, when not to, or what alternatives exist?
Excellent explicit guidance: 'When to use' section specifies link inventories or feeding scrape/scrape_article; 'Do not substitute' section explicitly lists alternatives (WebFetch, curl, wget) with conditions for when NOT to use them; 'Call ping first' provides prerequisite workflow.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
Behavior4/5
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description carries full behavioral disclosure burden and succeeds well. It reveals extension-side rate limiting (500ms minimum, max 3 concurrent tabs), WebSocket bridge mechanics ('bridgeTimeoutMs', 'EXTENSION_NOT_CONNECTED' handling), processing pipeline (deduplication and filtering happens in-extension), and DOM requirements ('logged-in session cookies, JS-rendered content'). Minor gap: doesn't explicitly state read-only nature, though implied by 'scan'.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Conciseness4/5
Is the description appropriately sized, front-loaded, and free of redundancy?
Uses clear markdown headers (### Purpose, ### When to use, etc.) creating scannable structure. Front-loaded with critical operational prerequisites (ping, lang). While lengthy due to 12 complex parameters and WebSocket architecture explanations, every section earns its place by providing necessary operational or behavioral context not found elsewhere.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Completeness4/5
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given high complexity (12 params, nested objects, WebSocket extension) and lack of output schema, the description adequately covers return values ('MultiUrlResult; success data is string[]'), error handling patterns (ping/retry logic), and parameter interactions. Could marginally improve by enumerating specific error cases in the return structure, but sufficiently complete for agent operation.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Parameters4/5
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 100%, establishing baseline 3. The description adds usage context beyond schema: specifically when to use 'zh-CN' ('Chinese users pass lang: zh-CN'), explains the relationship between concurrency/scrapeInterval and extension limits ('extension enforces...'), and clarifies that bridgeTimeoutMs is server-side only ('Stripped before forwarding'). This lifts it above baseline schema repetition.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Purpose5/5
Does the description clearly state what the tool does and how it differs from similar tools?
The description explicitly states 'Scan HTML for email addresses, dedupe, optional domain/keyword/limit filters,' providing a specific verb (scan), resource (HTML/email addresses), and scope (deduplication/filters). It clearly distinguishes from siblings like scrape_article or scrape_images by specifying the email extraction use case.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Usage Guidelines5/5
Does the description explain when to use this tool, when not to, or what alternatives exist?
Contains explicit 'When to use' section ('Contact/about/footer pages when the user wants an email list') and strong negative guidance against alternatives ('Do not substitute raw HTTP... do not use WebFetch, curl, wget'). It also specifies prerequisite workflow ('Call ping first') and chaining patterns ('Often chained after scrape_urls'), giving complete decision-making context.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
Behavior4/5
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, description carries full burden and discloses: extension/WebSocket bridge architecture, rate limiting ('min interval 500ms', 'concurrency cap 3'), error handling ('EXTENSION_NOT_CONNECTED'), return structure ('MultiUrlResult... { number, type }[]'), and DOM requirements ('real browser DOM, logged-in session cookies, JS-rendered content'). Lacks explicit destructive/idempotency statement.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Conciseness3/5
Is the description appropriately sized, front-loaded, and free of redundancy?
Well-structured with headers (###), but violates front-loading principles—opens with operational instructions (ping prerequisite) before stating purpose. Lengthy HTTP substitution warning, while valuable, consumes significant space before core function description. Density is justified by complexity but organization could be tighter.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Completeness5/5
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given 12 parameters, nested objects, zero annotations, and no output schema, description achieves completeness: specifies return type structure (MultiUrlResult), explains extension-side enforcement (50 URL max, 500ms intervals), documents prerequisite error handling workflow, and distinguishes from raw HTTP alternatives.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Parameters4/5
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 100% (baseline 3). Description adds usage context beyond schema: instructs Chinese users to pass 'zh-CN' on each call, references filter sub-fields by name in prose, and explains infrastructure constraints ('extension-enforced', 'Server forwards as-is'). Elevates above baseline by clarifying expected values and operational semantics.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Purpose5/5
Does the description clearly state what the tool does and how it differs from similar tools?
Description explicitly states specific verb 'Extract' with resource 'phone numbers' from source 'HTML' plus features 'simple type hints; optional filters'. Clearly distinguishes from siblings (scrape_article, scrape_emails, etc.) by specifying phone number extraction domain.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Usage Guidelines5/5
Does the description explain when to use this tool, when not to, or what alternatives exist?
Provides explicit when-to-use ('Business or support pages when the user wants callable numbers'), explicit prerequisite workflow ('Call ping first... then any scrape*'), and clear anti-guidance contrasting with alternatives ('do not use WebFetch, curl, wget... Only if the page is fully public and mostly static... may you consider a plain HTTP client').
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
Behavior4/5
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description carries full burden and succeeds well: explains WebSocket/extension architecture, connection lifecycle (`ping` requirement), return structure (`MultiUrlResult` with fields), rate limits (max 50 URLs, min 500ms interval, max 3 concurrency), and limitations ('Weak on heavy SPAs'). Minor gap: does not explicitly state non-destructive nature of extraction.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Conciseness4/5
Is the description appropriately sized, front-loaded, and free of redundancy?
Lengthy but appropriately dense and well-structured with markdown headers ('Purpose', 'When to use', 'Fallback', etc.). Front-loaded with critical connectivity warning ('Call `ping` first'). Every section serves a distinct purpose; no redundancy despite covering prerequisites, parameters, outputs, and limits.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Completeness5/5
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Comprehensive for a complex 11-parameter tool with nested objects and no output schema. Description compensates by detailing return structure (`MultiUrlResult`), error handling patterns (`EXTENSION_NOT_CONNECTED`), preconditions (WebSocket bridge), sibling differentiations, and specific limitations (SPA weakness). Complete coverage of operational context.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Parameters4/5
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema has 100% coverage. Description adds crucial implementation semantics: `bridgeTimeoutMs` is 'MCP Server only' and 'stripped before forwarding' (critical for understanding scope), clarifies distinction between nested `waitForScroll.scrollSpeed` and top-level `scrollSpeed`, and notes extension-enforced limits (max 50 URLs, concurrency caps) not visible in schema constraints.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Purpose5/5
Does the description clearly state what the tool does and how it differs from similar tools?
The 'Purpose' section explicitly states 'Extract **Markdown body** and metadata... from single-column long-form pages' with specific verb and resource. It clearly distinguishes from siblings by specifying this is for 'news, blogs, long docs' while directing users to `scrape` or `scrape_urls` for 'listing/home with only links'.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Usage Guidelines5/5
Does the description explain when to use this tool, when not to, or what alternatives exist?
Extensive explicit guidance: 'Call `ping` first' prerequisite with error handling steps; 'Do not substitute raw HTTP' details when to avoid alternatives (WebFetch/curl) vs when to use this tool; 'When to use' differentiates from siblings (`scrape`, `scrape_urls`); 'Fallback' provides explicit recovery path via `scrape`.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
Behavior5/5
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
No annotations provided, but description comprehensively covers: side effects (browser launching/closing), polling logic, timeout calculations (2× multiplier), cleanup behavior (closes only launched instances), success/failure return structures, and specific error codes (BROWSER_NOT_INSTALLED, EXTENSION_NOT_CONNECTED sub-fields).
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Conciseness4/5
Is the description appropriately sized, front-loaded, and free of redundancy?
Uses markdown headers (Purpose, When to call, Returns, Parameters, Notes) for scanability. Length is substantial but appropriate for complexity (browser orchestration without output schema). Key constraints (3000-60000 clamp) appear in both schema and description.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Completeness5/5
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
No output schema exists, but description compensates with detailed Returns section documenting success fields (ok, bridgeOk, browser, diagnostics) and failure modes. Notes section covers limitations (portable browsers). Complete for a complex connectivity/orchestration tool.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Parameters4/5
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema has 100% coverage (baseline 3). Description adds value by explaining parameter interactions: how autoLaunchBrowser affects channel skipping logic, and how postLaunchWaitMs creates worst-case 2× wait scenarios when polling already-running browsers.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Purpose5/5
Does the description clearly state what the tool does and how it differs from similar tools?
Opens with specific verb+resource: 'Verify the LionScraper extension is connected over the local WebSocket bridge.' Explicitly distinguishes from scrape* siblings via 'When to call' section referencing them and Notes stating 'Does not load pages or extract data.'
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Usage Guidelines5/5
Does the description explain when to use this tool, when not to, or what alternatives exist?
Contains explicit 'When to call' section listing two precise scenarios: (1) before any scrape* tool in new sessions, and (2) after EXTENSION_NOT_CONNECTED errors. Provides clear temporal ordering relative to sibling tools.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

GitHub Badge

Glama performs regular codebase and documentation scans to:

Confirm that the MCP server is working as expected.
Confirm that there are no obvious security issues.
Evaluate tool definition quality.

Our badge communicates server capabilities, safety, and installation instructions.

Card Badge

Copy to your README.md:

[![lionscraper-mcp MCP server](https://glama.ai/mcp/servers/dowant/lionscraper-mcp/badges/card.svg)](https://glama.ai/mcp/servers/dowant/lionscraper-mcp)

Score Badge

Copy to your README.md:

[![lionscraper-mcp MCP server](https://glama.ai/mcp/servers/dowant/lionscraper-mcp/badges/score.svg)](https://glama.ai/mcp/servers/dowant/lionscraper-mcp)

How to claim the server?

If you are the author of the server, you simply need to authenticate using GitHub.

However, if the MCP server belongs to an organization, you need to first add glama.json to the root of your repository.

{
  "$schema": "https://glama.ai/mcp/schemas/server.json",
  "maintainers": [
    "your-github-username"
  ]
}

Then, authenticate using GitHub.

Browse examples.

How to make a release?

A "release" on Glama is not the same as a GitHub release. To create a Glama release:

Claim the server if you haven't already.
Go to the Dockerfile admin page, configure the build spec, and click Deploy.
Once the build test succeeds, click Make Release, enter a version, and publish.

This process allows Glama to run security checks on your server and enables users to deploy it.

How to add a LICENSE?

Please follow the instructions in the GitHub documentation.

Once GitHub recognizes the license, the system will automatically detect it within a few hours.

If the license does not appear on the server after some time, you can manually trigger a new scan using the MCP server admin interface.

How to sync the server with GitHub?

Servers are automatically synced at least once per day, but you can also sync manually at any time to instantly update the server profile.

To manually sync the server, click the "Sync Server" button in the MCP server admin interface.

How is the quality score calculated?

The overall quality score combines two components: Tool Definition Quality (70%) and Server Coherence (30%).

Tool Definition Quality measures how well each tool describes itself to AI agents. Every tool is scored 1–5 across six dimensions: Purpose Clarity (25%), Usage Guidelines (20%), Behavioral Transparency (20%), Parameter Semantics (15%), Conciseness & Structure (10%), and Contextual Completeness (10%). The server-level definition quality score is calculated as 60% mean TDQS + 40% minimum TDQS, so a single poorly described tool pulls the score down.

Server Coherence evaluates how well the tools work together as a set, scoring four dimensions equally: Disambiguation (can agents tell tools apart?), Naming Consistency, Tool Count Appropriateness, and Completeness (are there gaps in the tool surface?).

Tiers are derived from the overall score: A (≥3.5), B (≥3.0), C (≥2.0), D (≥1.0), F (<1.0). B and above is considered passing.

Latest Blog Posts

Tool Definition Quality Score (TDQS)
By punkpeye on April 3, 2026.
mcp
The Hackers Who Tracked My Sleep Cycle
By punkpeye on March 26, 2026.
security
Open Source Has a Bot Problem
By punkpeye on March 19, 2026.
open source

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/dowant/lionscraper-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server