scrape_links

🔥 WEB SCRAPING - 1-50 URLs, RECOMMENDED 3-5. ALWAYS use_llm=true

This tool has TWO modes:

Basic scraping (use_llm=false) - Gets raw HTML/text - messy, requires manual parsing
AI-powered extraction (use_llm=true) - Intelligently extracts what you need ⭐ USE THIS!

⚡ ALWAYS SET use_llm=true FOR INTELLIGENT EXTRACTION ⚡

Why use AI extraction (use_llm=true):

Filters out navigation, ads, footers automatically
Extracts ONLY what you specify in what_to_extract
Handles complex page structures intelligently
Returns clean, structured content ready to use
Saves hours of manual HTML parsing
Cost: pennies (~$0.01 per 10 pages)

Token Budget: 32,000 tokens distributed across URLs.

3 URLs: ~10,666 tokens each (deep extraction)
5 URLs: ~6,400 tokens each (RECOMMENDED: balanced)
10 URLs: ~3,200 tokens each (detailed)
50 URLs: ~640 tokens each (quick scan)

Extraction Prompt Formula:

Extract [target1] | [target2] | [target3] | [target4] | [target5]
with focus on [aspect1], [aspect2], [aspect3]

Extraction Rules:

Use pipe | to separate extraction targets
Minimum 3 targets required
Be SPECIFIC about what you want ("pricing tiers" not "pricing")
Include "with focus on" to prioritize certain aspects
More targets = more comprehensive extraction
Aim for 5-10 extraction targets

Extraction Templates by Domain:

Product Research:

Extract pricing details | feature comparisons | user reviews | technical specifications | 
integration options | support channels | deployment models | security features 
with focus on enterprise capabilities, pricing transparency, and integration complexity

Technical Documentation:

Extract API endpoints | authentication methods | rate limits | error codes | 
request examples | response schemas | SDK availability | webhook support 
with focus on authentication flow, rate limiting policies, and error handling patterns

Competitive Analysis:

Extract product features | pricing models | target customers | unique selling points | 
technology stack | customer testimonials | case studies | market positioning 
with focus on differentiators, pricing strategy, and customer satisfaction

Example: ❌ BAD: {"urls": ["url"], "use_llm": false, "what_to_extract": "get pricing"} → raw HTML, vague prompt, 1 target, no focus areas

✅ GOOD: {"urls": [5 URLs], "use_llm": true, "what_to_extract": "Extract pricing tiers | plan features | API rate limits | enterprise options | integration capabilities | user testimonials with focus on enterprise features, API limitations, and real-world performance data"} → clean structured extraction

Pro Tips:

ALWAYS use use_llm=true - The AI extraction is the tool's superpower
Use 3-10 URLs - Balance between depth and breadth
Specify 5-10 extraction targets - More targets = more comprehensive
Use pipe | separators - Clearly separate each target
Add focus areas - "with focus on X, Y, Z" for prioritization
Be specific - "pricing tiers" not "pricing", "API rate limits" not "API info"
Cover multiple aspects - Features, pricing, technical, social proof

Automatic Fallback: Basic → JavaScript rendering → JavaScript + US geo-targeting Batching: Max 30 concurrent requests (50 URLs = [30] then [20] batches)

REMEMBER: AI extraction costs pennies but saves hours of manual parsing!

Name	Required	Description
`urls`	Yes	URLs to scrape (1-50). Recommend 3-5 URLs for balanced depth/breadth. More URLs = broader coverage but fewer tokens per URL. 3 URLs: ~10K tokens each (deep); 10 URLs: ~3K tokens each (balanced); 50 URLs: ~640 tokens each (scan).
`timeout`	No	Timeout in seconds for each URL
`use_llm`	No	Enable AI processing for content extraction (requires OPENROUTER_API_KEY)
`what_to_extract`	No	Specific content extraction instructions for AI. Will be enhanced with conciseness suffix automatically.

Name

Required

Description

Default

urls

Yes

URLs to scrape (1-50). Recommend 3-5 URLs for balanced depth/breadth. More URLs = broader coverage but fewer tokens per URL. 3 URLs: ~10K tokens each (deep); 10 URLs: ~3K tokens each (balanced); 50 URLs: ~640 tokens each (scan).

timeout

Timeout in seconds for each URL

use_llm

Enable AI processing for content extraction (requires OPENROUTER_API_KEY)

what_to_extract

Specific content extraction instructions for AI. Will be enhanced with conciseness suffix automatically.

Research Powerpack MCP

Instructions

Input Schema

Tool Definition Quality

Other Tools

Latest Blog Posts

MCP directory API