Skip to main content
Glama

extract_url

Extract structured data from web pages by providing a URL and describing what you need. Returns clean JSON with specified fields, handling JavaScript-rendered and Cloudflare-protected sites automatically.

Instructions

Extract structured data from any web page by providing a URL and describing what you want. Returns clean JSON with exactly the fields you asked for — no HTML parsing needed. Handles JavaScript-rendered pages and Cloudflare-protected sites automatically. This is the general-purpose extraction tool. Use extract_article for full article content or extract_metadata for page meta tags — they are optimised shortcuts. Read-only — makes no changes to any external system. Requires HAUNT_API_KEY environment variable. Free tier: 100 requests/month. Returns an error if rate limit or API key is invalid.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
urlYesThe full URL of the page to extract data from. Must be a valid HTTP or HTTPS URL. Supports any public web page including JavaScript-heavy SPAs and Cloudflare-protected sites.
promptYesA plain-English description of what data to extract. Be specific about which fields you want. Examples: "product name, price, and availability", "all email addresses and phone numbers", "the main heading and first paragraph".
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure and does so comprehensively. It explicitly states 'Read-only — makes no changes to any external system,' discloses authentication requirements ('Requires HAUNT_API_KEY environment variable'), and provides rate limit information ('Free tier: 100 requests/month. Returns an error if rate limit or API key is invalid'). It also describes technical capabilities ('Handles JavaScript-rendered pages and Cloudflare-protected sites automatically') and output format ('Returns clean JSON with exactly the fields you asked for — no HTML parsing needed').

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is efficiently structured with every sentence adding value: purpose statement, output format, technical capabilities, sibling tool differentiation, behavioral disclosures, and authentication/rate limit information. It's front-loaded with the core functionality and appropriately sized for the tool's complexity.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a tool with no annotations and no output schema, the description provides comprehensive context. It covers purpose, usage guidelines, behavioral traits (read-only nature, authentication requirements, rate limits, technical capabilities), and output format. The combination of 100% schema coverage and rich description text makes this complete for agent understanding.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already fully documents both parameters. The description adds some context about the prompt parameter ('describing what you want') and implies the URL parameter supports complex sites, but doesn't provide additional semantic meaning beyond what's in the schema descriptions. This meets the baseline expectation when schema coverage is complete.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the specific action ('extract structured data'), resource ('from any web page'), and mechanism ('by providing a URL and describing what you want'). It explicitly distinguishes from sibling tools by naming extract_article and extract_metadata as 'optimised shortcuts' for specific use cases, establishing this as the general-purpose extraction tool.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit guidance on when to use this tool vs alternatives: 'This is the general-purpose extraction tool. Use extract_article for full article content or extract_metadata for page meta tags — they are optimised shortcuts.' This gives clear context for tool selection and explicitly names the alternative tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/Darko893/haunt-mcp-server'

If you have feedback or need assistance with the MCP directory API, please join our Discord server