Skip to main content
Glama

novada_extract

Read-onlyIdempotent

Extract clean content from URLs, including anti-bot pages, with automatic rendering escalation. Supports batch mode for up to 10 pages in parallel. Output as markdown, text, HTML, or JSON.

Instructions

Extract clean content from any URL. Handles Cloudflare, DataDome, Kasada automatically via auto-escalation (static → JS render → Browser CDP). Batch mode: pass url as array for up to 10 pages in parallel.

Use for: Reading pages, batch-extracting search results, pulling structured fields (price, author, date). Works on anti-bot pages automatically. Not for: URL discovery (novada_map), multi-page crawl (novada_crawl), platform data like Amazon/LinkedIn (novada_scrape is richer). Key rule: Leave render="auto" (default). Only set render="render" for known JS-heavy SPAs. Auto mode is 15-100x faster on static sites.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
urlYesURL or array of URLs (max 10) to extract. Batch mode processes in parallel. For multiple URLs, use the urls array param instead.
urlsNoArray of URLs to extract in parallel (max 10). Alias for url when passing multiple URLs. Use for batch research workflows extracting from several pages in one call. Returns a structured markdown document with one labeled section per URL (### [1/N] url). Single url param still returns a single markdown document.
formatYesOutput format. 'markdown' (default): structured readable output. 'text': plain text. 'html': raw HTML (truncated at 10K). 'json': structured JSON object with typed fields — best for programmatic agent consumption.markdown
queryNoOptional query for relevance context. Helps the calling agent focus on relevant sections.
renderYesRendering mode. 'auto' (default): tries static first, escalates if JS-heavy. 'static': static HTML only. 'js' (or 'render'): force JS rendering via Web Unblocker. 'browser': force Browser API CDP (requires NOVADA_BROWSER_WS).auto
fieldsNoSpecific fields to extract (e.g. ['price', 'author', 'availability', 'rating']). Returns a structured ## Requested Fields block. JSON-LD structured data is checked first; falls back to pattern matching.
max_charsNoMaximum characters to return (default: 25000, max: 100000). When content exceeds this limit, it is truncated and a notice is appended. Common mistake: do not set max_chars=100000 by default — use 25000 for most pages.
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already mark it as read-only, idempotent, and non-destructive. Description adds context: auto-escalation mechanism (static -> JS -> Browser CDP), batch parallelism for up to 10 URLs, and performance notes (auto is 15-100x faster). No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Well-structured with headings, bullet points, and key rules. Every sentence adds value. No redundancy or fluff.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Despite no output schema, description explains all return formats (markdown, text, html, json) and the structured markdown document for batch. All 7 parameters are adequately described, including edge cases like max_chars truncation. For a complex extraction tool, this is complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, but description adds significant value: explains the alias relationship between url and urls, describes the structured batch output format, warns about max_chars default, and clarifies render modes beyond enum values. For fields, it explains the extraction order (JSON-LD first, then pattern matching).

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description starts with a clear verb+resource: 'Extract clean content from any URL.' It explicitly distinguishes from sibling tools (novada_map, novada_crawl, novada_scrape) by contrasting their purposes.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Has explicit 'Use for:' and 'Not for:' sections listing specific use cases and alternatives. Provides a key rule about leaving render='auto' and when to override, which guides correct invocation.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/NovadaLabs/novada-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server