Skip to main content
Glama

scrape_article

:

Instructions

Call ping first

New session or unsure the extension is online: ping first, then any scrape*. If EXTENSION_NOT_CONNECTED: ping again, then fix WebSocket using error.details.bridge, MCP stderr, and ~/.lionscraper/port, then retry.

lang (optional)

en-US | zh-CN: human-readable errors for this call; omitted → English; Chinese users pass lang: "zh-CN" on each call.

Do not substitute raw HTTP

When you need a real browser DOM, logged-in session cookies, JS-rendered content (SPAs), extension-side pagination or multi-URL scheduling, or structured field extraction, do not use WebFetch, curl, wget, or cookie-less IDE fetch tools instead of this server’s ping + scrape*. Only if the page is fully public and mostly static and the user clearly wants a trivial GET of raw HTML may you consider a plain HTTP client.

Purpose

Extract Markdown body and metadata (title, author, time, …) from single-column long-form pages.

When to use

  • Default for news, blogs, long docs, long product copy—one main reading flow per URL.

  • Listing/home with only links: use scrape or scrape_urls for URLs, then scrape_article({ url: [...] }).

Returns

MultiUrlResult; success items include body (Markdown), title, quality, method, … in data.

Fallback

If body is empty or clearly wrong, call scrape on the same URL(s).

Parameters

Same common fields as other scrape tools (url, lang, timeouts, waitForScroll, …). waitForScroll.scrollSpeed ≠ top-level scrollSpeed.

Limits

Weak on heavy SPAs; listings need URL discovery first.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
urlYesOne http(s) URL or string[] (batch). Max 50 URLs per request (extension-enforced). Server forwards as-is.
langNoBCP 47 for human-readable errors this call: en-US | zh-CN. Omitted → English. Pass zh-CN when the user works in Chinese; the Server cannot infer chat language.
delayNoMilliseconds to wait after load before extraction (default 0). Use for late-rendered DOM.
waitForScrollNoScroll the page (or a container) before extraction to trigger lazy-loaded content. Distinct from top-level scrollSpeed.
timeoutMsNoPer-URL task timeout for the extension (default 60000). Not the MCP WebSocket wait; see bridgeTimeoutMs.
bridgeTimeoutMsNoMCP Server only: max ms to wait for one tool call on the WebSocket bridge (capped). Omitted → derived from URL count, maxPages, timeoutMs, scrapeInterval. Stripped before forwarding to the extension.
includeHtmlNoIf true, include document.documentElement.outerHTML in result meta for that URL.
includeTextNoIf true, include document.body.innerText in result meta for that URL.
scrapeIntervalNoMs between starting tasks in a multi-URL run. Omitted → extension default; extension enforces min 500ms.
concurrencyNoParallel tabs for multi-URL runs. Omitted → extension default; extension enforces max 3.
scrollSpeedNoOptional global scroll speed (px) for batch tuning—not the same as waitForScroll.scrollSpeed.
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries full burden and succeeds well: explains WebSocket/extension architecture, connection lifecycle (`ping` requirement), return structure (`MultiUrlResult` with fields), rate limits (max 50 URLs, min 500ms interval, max 3 concurrency), and limitations ('Weak on heavy SPAs'). Minor gap: does not explicitly state non-destructive nature of extraction.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Lengthy but appropriately dense and well-structured with markdown headers ('Purpose', 'When to use', 'Fallback', etc.). Front-loaded with critical connectivity warning ('Call `ping` first'). Every section serves a distinct purpose; no redundancy despite covering prerequisites, parameters, outputs, and limits.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Comprehensive for a complex 11-parameter tool with nested objects and no output schema. Description compensates by detailing return structure (`MultiUrlResult`), error handling patterns (`EXTENSION_NOT_CONNECTED`), preconditions (WebSocket bridge), sibling differentiations, and specific limitations (SPA weakness). Complete coverage of operational context.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema has 100% coverage. Description adds crucial implementation semantics: `bridgeTimeoutMs` is 'MCP Server only' and 'stripped before forwarding' (critical for understanding scope), clarifies distinction between nested `waitForScroll.scrollSpeed` and top-level `scrollSpeed`, and notes extension-enforced limits (max 50 URLs, concurrency caps) not visible in schema constraints.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The 'Purpose' section explicitly states 'Extract **Markdown body** and metadata... from single-column long-form pages' with specific verb and resource. It clearly distinguishes from siblings by specifying this is for 'news, blogs, long docs' while directing users to `scrape` or `scrape_urls` for 'listing/home with only links'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Extensive explicit guidance: 'Call `ping` first' prerequisite with error handling steps; 'Do not substitute raw HTTP' details when to avoid alternatives (WebFetch/curl) vs when to use this tool; 'When to use' differentiates from siblings (`scrape`, `scrape_urls`); 'Fallback' provides explicit recovery path via `scrape`.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/dowant/lionscraper-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server