extract
Fetch multiple URLs and return LLM-ready content. Use an optional query to get the most relevant highlights instead of full text. Every result includes category and page structure classification.
Instructions
Fetch one or more URLs and return LLM-ready content from Octen. Unique to Octen: pass a query to get the most relevant highlights per page instead of the full body; every result includes a category (topical) and page_structure (typology) classification. Bare hosts like 'octen.ai' are auto-normalized to https. Cached when fresh.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| urls | Yes | URLs to extract. 1-20 per call. Bare hosts ok. | |
| query | No | Optional intent-focused keywords. When set, each result returns `highlights` (most relevant snippets, ranked) instead of `full_content`. | |
| max_age_seconds | No | Maximum age of cached content in seconds. Default 24h. Lower this for time-sensitive pages (news / prices). | |
| format | No | Output format. Default markdown. | markdown |
| timeout | No | Per-URL timeout in seconds (1-60). | |
| include_images | No | Return image URLs found on each page. | |
| include_videos | No | Return video URLs found on each page. | |
| include_audio | No | Return audio URLs found on each page. | |
| include_favicon | No | Return each page's favicon URL. |