rendex_extract
Extract clean readable content from webpages as Markdown, JSON, or HTML. Handles JavaScript-rendered SPAs, stripping ads and navigation to provide article body, title, byline, and excerpt for LLM usage.
Instructions
Extract clean reader-mode content from any webpage as Markdown, JSON, or HTML. Runs the same Chromium render pass as a screenshot, so it captures content after JavaScript runs — handles SPAs that fetch-only readers miss. Strips nav, ads, and boilerplate, returning the article body plus title, byline, and excerpt. Great for feeding page content to an LLM, summarization, or RAG ingestion.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| url | Yes | The webpage URL to extract readable content from. | |
| extractFormat | No | Output shape — markdown (default, LLM-friendly prose), json (structured fields: title/byline/excerpt/siteName/length), or html (cleaned reader-mode HTML). | markdown |
| waitUntil | No | Page readiness event. networkidle2 (default) is best for most sites. Use domcontentloaded for speed, networkidle0 for completeness. | networkidle2 |
| timeout | No | Maximum seconds to wait for page load (5-60). Cloudflare has a 60s hard cap. | |
| device | No | Device preset that sets viewport, scale factor, and user agent in one shot. E.g. 'iphone_15' to extract the mobile version of a page. | |
| blockAds | No | Block ads and trackers before extraction | |
| blockCookieBanners | No | Hide common cookie/consent walls (GDPR/CCPA banners) before extraction. A curated selector list, lighter than custom hideSelectors. | |
| hideSelectors | No | CSS selectors to hide (display:none) before extraction. E.g. ['.modal', '#newsletter-popup'] to remove overlays. Max 50 selectors. |