Skip to main content
Glama

Extract URL to Markdown

averra_extract_url
Read-onlyIdempotent

Extract clean Markdown content from webpages for LLM processing by fetching pages, removing clutter, and converting to structured text with metadata.

Instructions

Convert any webpage URL into clean, LLM-ready Markdown using Averra Extract.

This tool fetches the page (executing JavaScript via a headless browser), strips navigation/ads/UI clutter via Mozilla Readability, converts the main content to Markdown, and returns it along with metadata (title, word count, links, language). Results are cached for 7 days and shared across users.

Use this when you need the actual content of a webpage for an LLM — e.g. reading a blog post, docs page, article, or product page to answer a question or synthesize information.

Args:

  • url (string, required): The webpage URL. Accepts https://example.com or bare example.com (https:// is auto-added). Max 2048 chars.

  • response_format ('markdown' | 'json', optional): Output format. Default 'markdown'.

Returns: For JSON format, structured data: { "markdown": string, // Clean markdown content of the page "metadata": { "title": string, // Page title (from first H1 or URL fallback) "word_count": number, // Word count of extracted text "links": string[], // Unique URLs found in the content "language": string, // "en" or "unknown" "timestamp": string // ISO 8601 extraction time }, "warning": string (optional) // Present if content is thin (<200 words) }

For Markdown format: a formatted document with title, metadata summary, and the extracted markdown.

Counts against your monthly Extract quota (including cached requests). Use averra_check_usage to see remaining quota.

Examples:

  • Use when: "Summarize this blog post: https://example.com/post" → extract then summarize

  • Use when: "What does this docs page say about auth?" → extract then answer

  • Don't use when: You just need a link preview or metadata (this returns full content)

  • Don't use when: You need JSON extraction with a schema (not supported yet)

Error Handling:

  • 400: URL rejected by safety checks (malformed, private IP, unreachable host) — check the URL resolves publicly.

  • 401: Invalid API key — check AVERRA_EXTRACT_API_KEY env var

  • 404: Page not found at URL

  • 429: Monthly limit exceeded — upgrade plan or wait

  • 502/503/504: Scraping service temporarily unavailable — retry

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
urlYesThe webpage URL to extract content from. Accepts full URLs (`https://example.com`) or bare hostnames (`example.com` — auto-prefixed with `https://`). Max 2048 chars. Private/internal IPs are blocked by the API.
response_formatNoOutput format: 'markdown' for human-readable output (default), 'json' for machine-readable structured datamarkdown
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description adds valuable behavioral context beyond what annotations provide: it explains the 7-day caching with cross-user sharing, mentions the monthly quota system, details the JavaScript execution via headless browser, and provides comprehensive error handling information (specific HTTP status codes with explanations). While annotations cover safety (readOnlyHint=true, destructiveHint=false, idempotentHint=true), the description enriches understanding of operational constraints and implementation details.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured with clear sections (purpose, usage guidance, args, returns, examples, error handling) and most sentences earn their place. However, the error handling section is quite detailed with multiple specific status codes, making it somewhat lengthy. The information is valuable but could be more concise in presentation.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a tool with rich annotations and no output schema, the description provides excellent completeness: it explains the return format in detail for both JSON and Markdown outputs, covers quota implications, references sibling tools, provides concrete usage examples, and documents error scenarios. This gives the agent comprehensive understanding despite the lack of structured output schema.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 100% schema description coverage, the input schema already documents both parameters thoroughly. The description adds minimal value beyond the schema: it mentions the URL format acceptance (bare hostnames get https:// auto-added) and clarifies the default response_format, but doesn't provide additional semantic context. This meets the baseline expectation when schema coverage is complete.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the specific action ('Convert any webpage URL into clean, LLM-ready Markdown') and distinguishes it from alternatives by emphasizing it's for extracting actual content rather than just metadata or link previews. It explicitly names the technology (Averra Extract) and processing steps (fetching with JavaScript execution, Mozilla Readability cleanup, Markdown conversion).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit guidance on when to use ('when you need the actual *content* of a webpage for an LLM') with concrete examples (blog posts, docs pages, articles, product pages) and when not to use ('just need a link preview or metadata' or 'need JSON extraction with a schema'). It also references the sibling tool averra_check_usage for quota management.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/Swwyymm/averra-extract-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server