extract
Fetch URLs and get clean, LLM-ready markdown with topic and structure classification. Optionally return relevance-ranked excerpts instead of full page body.
Instructions
Fetch one or more URLs and return LLM-ready content from Octen. By default (no query) it returns each page's full content — this is what you want in almost all cases. Only pass query when the user explicitly asks to fetch relevance-ranked snippets for a specific topic; doing so returns highlights INSTEAD of the full body, so the content will be partial. Every result also includes a category (topical) and page_structure (typology) classification, unique to Octen. Bare hosts like 'octen.ai' are auto-normalized to https. Cached when fresh.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| urls | Yes | URLs to extract. 1-20 per call. Bare hosts ok. | |
| query | No | Optional — leave UNSET in the normal case. When unset, each result returns the page's `full_content` (the complete text). Only set this when the user explicitly wants relevance-ranked snippets for a specific query/topic: setting it makes each result return `highlights` (ranked excerpts) and OMIT `full_content`, so the page body will be incomplete. Do not pass it just to focus a normal fetch. | |
| format | No | Output format. Default markdown. | markdown |
| timeout | No | Per-URL timeout in seconds (1-60). | |
| include_audio | No | Return audio URLs found on each page. | |
| include_images | No | Return image URLs found on each page. | |
| include_videos | No | Return video URLs found on each page. | |
| include_favicon | No | Return each page's favicon URL. | |
| max_age_seconds | No | Maximum age of cached content in seconds. Default 24h. Lower this for time-sensitive pages (news / prices). |