Extract URL to Markdown
averra_extract_urlExtract clean Markdown content from webpages for LLM processing by fetching pages, removing clutter, and converting to structured text with metadata.
Instructions
Convert any webpage URL into clean, LLM-ready Markdown using Averra Extract.
This tool fetches the page (executing JavaScript via a headless browser), strips navigation/ads/UI clutter via Mozilla Readability, converts the main content to Markdown, and returns it along with metadata (title, word count, links, language). Results are cached for 7 days and shared across users.
Use this when you need the actual content of a webpage for an LLM — e.g. reading a blog post, docs page, article, or product page to answer a question or synthesize information.
Args:
url (string, required): The webpage URL. Accepts
https://example.comor bareexample.com(https:// is auto-added). Max 2048 chars.response_format ('markdown' | 'json', optional): Output format. Default 'markdown'.
Returns: For JSON format, structured data: { "markdown": string, // Clean markdown content of the page "metadata": { "title": string, // Page title (from first H1 or URL fallback) "word_count": number, // Word count of extracted text "links": string[], // Unique URLs found in the content "language": string, // "en" or "unknown" "timestamp": string // ISO 8601 extraction time }, "warning": string (optional) // Present if content is thin (<200 words) }
For Markdown format: a formatted document with title, metadata summary, and the extracted markdown.
Counts against your monthly Extract quota (including cached requests). Use averra_check_usage to see remaining quota.
Examples:
Use when: "Summarize this blog post: https://example.com/post" → extract then summarize
Use when: "What does this docs page say about auth?" → extract then answer
Don't use when: You just need a link preview or metadata (this returns full content)
Don't use when: You need JSON extraction with a schema (not supported yet)
Error Handling:
400: URL rejected by safety checks (malformed, private IP, unreachable host) — check the URL resolves publicly.
401: Invalid API key — check AVERRA_EXTRACT_API_KEY env var
404: Page not found at URL
429: Monthly limit exceeded — upgrade plan or wait
502/503/504: Scraping service temporarily unavailable — retry
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| url | Yes | The webpage URL to extract content from. Accepts full URLs (`https://example.com`) or bare hostnames (`example.com` — auto-prefixed with `https://`). Max 2048 chars. Private/internal IPs are blocked by the API. | |
| response_format | No | Output format: 'markdown' for human-readable output (default), 'json' for machine-readable structured data | markdown |