extract
Scrape URLs into Markdown, text, or structured data for LLM processing. Retrieves article content via Mozilla Readability, OG metadata, links, and images while blocking ads and HTML noise.
Instructions
Extract clean, structured content from a URL. Returns Markdown, plain text, article data (via Mozilla Readability), OG metadata, links, images, or custom structured fields. Optimized for feeding web content to LLMs without HTML noise.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| url | Yes | The URL to extract content from. | |
| type | No | Extraction mode (default: markdown). 'article' uses Mozilla Readability for article body extraction. 'structured' returns title, author, word count, and cleaned content. 'metadata' returns OG tags and meta fields. 'links' and 'images' return lists of URLs. | |
| selector | No | CSS selector to scope extraction to a specific element. | |
| waitFor | No | CSS selector to wait for before extracting. | |
| maxLength | No | Maximum character length of the returned content. | |
| cleanOutput | No | Remove excess whitespace and empty links (default: true). | |
| darkMode | No | Render the page with dark color scheme. | |
| blockAds | No | Block ad networks. | |
| blockCookieBanners | No | Block cookie consent popups. | |
| fields | No | Custom field extraction map: keys are field names, values describe what to extract. Example: {"price": "product price as a number", "rating": "star rating out of 5"}. |