extract
Extract the clean main content of any web page by rendering with real Chromium to handle JavaScript, then stripping ads and boilerplate. Returns Markdown, text, or HTML for use in LLMs and RAG.
Instructions
Extract the clean main content of a web page for LLMs / RAG.
Renders the page with real Chromium (so JavaScript-built pages work), then strips ads, navigation, and boilerplate. Returns clean Markdown, text, or HTML.
Args: url: Page URL to extract (http/https). format: "markdown", "text", or "html". include_tables: Keep tables in the extracted content.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| url | Yes | ||
| format | No | markdown | |
| include_tables | No |
Output Schema
| Name | Required | Description | Default |
|---|---|---|---|
| result | Yes |