clean_content
Remove noise from raw HTML to extract readable plain text. Retain only main article content, headings, paragraphs, tables, and lists.
Instructions
Clean raw HTML by removing scripts, styles, navigation bars, footers, cookie banners, ads, and other noise. Returns readable plain text keeping only main article content, headings, paragraphs, tables, and lists.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| url | No | Optional source URL (helps with relative link resolution). | |
| html | Yes | Raw HTML string to clean. |