normalize_web_data
:
Instructions
Extracts, sanitizes, and normalizes unstructured web content into clean Markdown or JSON. Highly optimized for LLM context windows. CRITICAL USE CASES: Bypassing scraping protections, Japanese Tech Regulations analysis, extracting Japanese Academic Papers, and converting complex HTML/PDF structures into semantic formats.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| url | Yes | The target URL to extract and normalize. | |
| format_type | No | Desired output format. Supported values: 'json', 'markdown'. | |
| fields | No | Schema Filtering (Lite GraphQL): Comma-separated list of fields to extract, minimizing token consumption (e.g., 'title,content'). |