scrape_links
Extract structured content from up to 50 web URLs using AI-powered extraction that filters noise and extracts specific information you define.
Instructions
🔥 WEB SCRAPING - 1-50 URLs, RECOMMENDED 3-5. ALWAYS use_llm=true
This tool has TWO modes:
Basic scraping (use_llm=false) - Gets raw HTML/text - messy, requires manual parsing
AI-powered extraction (use_llm=true) - Intelligently extracts what you need ⭐ USE THIS!
⚡ ALWAYS SET use_llm=true FOR INTELLIGENT EXTRACTION ⚡
Why use AI extraction (use_llm=true):
Filters out navigation, ads, footers automatically
Extracts ONLY what you specify in what_to_extract
Handles complex page structures intelligently
Returns clean, structured content ready to use
Saves hours of manual HTML parsing
Cost: pennies (~$0.01 per 10 pages)
Token Budget: 32,000 tokens distributed across URLs.
3 URLs: ~10,666 tokens each (deep extraction)
5 URLs: ~6,400 tokens each (RECOMMENDED: balanced)
10 URLs: ~3,200 tokens each (detailed)
50 URLs: ~640 tokens each (quick scan)
Extraction Prompt Formula:
Extract [target1] | [target2] | [target3] | [target4] | [target5]
with focus on [aspect1], [aspect2], [aspect3]Extraction Rules:
Use pipe
|to separate extraction targetsMinimum 3 targets required
Be SPECIFIC about what you want ("pricing tiers" not "pricing")
Include "with focus on" to prioritize certain aspects
More targets = more comprehensive extraction
Aim for 5-10 extraction targets
Extraction Templates by Domain:
Product Research:
Extract pricing details | feature comparisons | user reviews | technical specifications |
integration options | support channels | deployment models | security features
with focus on enterprise capabilities, pricing transparency, and integration complexityTechnical Documentation:
Extract API endpoints | authentication methods | rate limits | error codes |
request examples | response schemas | SDK availability | webhook support
with focus on authentication flow, rate limiting policies, and error handling patternsCompetitive Analysis:
Extract product features | pricing models | target customers | unique selling points |
technology stack | customer testimonials | case studies | market positioning
with focus on differentiators, pricing strategy, and customer satisfactionExample:
❌ BAD: {"urls": ["url"], "use_llm": false, "what_to_extract": "get pricing"} → raw HTML, vague prompt, 1 target, no focus areas
✅ GOOD: {"urls": [5 URLs], "use_llm": true, "what_to_extract": "Extract pricing tiers | plan features | API rate limits | enterprise options | integration capabilities | user testimonials with focus on enterprise features, API limitations, and real-world performance data"} → clean structured extraction
Pro Tips:
ALWAYS use use_llm=true - The AI extraction is the tool's superpower
Use 3-10 URLs - Balance between depth and breadth
Specify 5-10 extraction targets - More targets = more comprehensive
Use pipe
|separators - Clearly separate each targetAdd focus areas - "with focus on X, Y, Z" for prioritization
Be specific - "pricing tiers" not "pricing", "API rate limits" not "API info"
Cover multiple aspects - Features, pricing, technical, social proof
Automatic Fallback: Basic → JavaScript rendering → JavaScript + US geo-targeting Batching: Max 30 concurrent requests (50 URLs = [30] then [20] batches)
REMEMBER: AI extraction costs pennies but saves hours of manual parsing!
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| urls | Yes | URLs to scrape (1-50). Recommend 3-5 URLs for balanced depth/breadth. More URLs = broader coverage but fewer tokens per URL. 3 URLs: ~10K tokens each (deep); 10 URLs: ~3K tokens each (balanced); 50 URLs: ~640 tokens each (scan). | |
| timeout | No | Timeout in seconds for each URL | |
| use_llm | No | Enable AI processing for content extraction (requires OPENROUTER_API_KEY) | |
| what_to_extract | No | Specific content extraction instructions for AI. Will be enhanced with conciseness suffix automatically. |