Skip to main content
Glama
yigitkonur

Research Powerpack MCP

by yigitkonur

scrape_links

Extract structured content from up to 50 web URLs using AI-powered extraction that filters noise and extracts specific information you define.

Instructions

🔥 WEB SCRAPING - 1-50 URLs, RECOMMENDED 3-5. ALWAYS use_llm=true

This tool has TWO modes:

  1. Basic scraping (use_llm=false) - Gets raw HTML/text - messy, requires manual parsing

  2. AI-powered extraction (use_llm=true) - Intelligently extracts what you need ⭐ USE THIS!

⚡ ALWAYS SET use_llm=true FOR INTELLIGENT EXTRACTION ⚡

Why use AI extraction (use_llm=true):

  • Filters out navigation, ads, footers automatically

  • Extracts ONLY what you specify in what_to_extract

  • Handles complex page structures intelligently

  • Returns clean, structured content ready to use

  • Saves hours of manual HTML parsing

  • Cost: pennies (~$0.01 per 10 pages)

Token Budget: 32,000 tokens distributed across URLs.

  • 3 URLs: ~10,666 tokens each (deep extraction)

  • 5 URLs: ~6,400 tokens each (RECOMMENDED: balanced)

  • 10 URLs: ~3,200 tokens each (detailed)

  • 50 URLs: ~640 tokens each (quick scan)

Extraction Prompt Formula:

Extract [target1] | [target2] | [target3] | [target4] | [target5]
with focus on [aspect1], [aspect2], [aspect3]

Extraction Rules:

  • Use pipe | to separate extraction targets

  • Minimum 3 targets required

  • Be SPECIFIC about what you want ("pricing tiers" not "pricing")

  • Include "with focus on" to prioritize certain aspects

  • More targets = more comprehensive extraction

  • Aim for 5-10 extraction targets

Extraction Templates by Domain:

Product Research:

Extract pricing details | feature comparisons | user reviews | technical specifications | 
integration options | support channels | deployment models | security features 
with focus on enterprise capabilities, pricing transparency, and integration complexity

Technical Documentation:

Extract API endpoints | authentication methods | rate limits | error codes | 
request examples | response schemas | SDK availability | webhook support 
with focus on authentication flow, rate limiting policies, and error handling patterns

Competitive Analysis:

Extract product features | pricing models | target customers | unique selling points | 
technology stack | customer testimonials | case studies | market positioning 
with focus on differentiators, pricing strategy, and customer satisfaction

Example: ❌ BAD: {"urls": ["url"], "use_llm": false, "what_to_extract": "get pricing"} → raw HTML, vague prompt, 1 target, no focus areas

✅ GOOD: {"urls": [5 URLs], "use_llm": true, "what_to_extract": "Extract pricing tiers | plan features | API rate limits | enterprise options | integration capabilities | user testimonials with focus on enterprise features, API limitations, and real-world performance data"} → clean structured extraction

Pro Tips:

  1. ALWAYS use use_llm=true - The AI extraction is the tool's superpower

  2. Use 3-10 URLs - Balance between depth and breadth

  3. Specify 5-10 extraction targets - More targets = more comprehensive

  4. Use pipe | separators - Clearly separate each target

  5. Add focus areas - "with focus on X, Y, Z" for prioritization

  6. Be specific - "pricing tiers" not "pricing", "API rate limits" not "API info"

  7. Cover multiple aspects - Features, pricing, technical, social proof

Automatic Fallback: Basic → JavaScript rendering → JavaScript + US geo-targeting Batching: Max 30 concurrent requests (50 URLs = [30] then [20] batches)

REMEMBER: AI extraction costs pennies but saves hours of manual parsing!

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
urlsYesURLs to scrape (1-50). Recommend 3-5 URLs for balanced depth/breadth. More URLs = broader coverage but fewer tokens per URL. 3 URLs: ~10K tokens each (deep); 10 URLs: ~3K tokens each (balanced); 50 URLs: ~640 tokens each (scan).
timeoutNoTimeout in seconds for each URL
use_llmNoEnable AI processing for content extraction (requires OPENROUTER_API_KEY)
what_to_extractNoSpecific content extraction instructions for AI. Will be enhanced with conciseness suffix automatically.
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description carries full burden. It discloses two operational modes, token budget, cost estimation, automatic fallback process, and batching limit. However, it does not mention error handling or rate limiting behavior.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is long but well-structured with headers, lists, and bold keywords. Front-loaded with essential info. Some repetition of 'ALWAYS use_llm=true' could be trimmed. Overall, every section earns its place.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (two modes, token budget, extraction prompts), the description covers all critical aspects: parameters, usage scenarios, best practices, and fallback. Without an output schema, it could detail return structure more, but the provided information is sufficient for an agent.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, baseline 3. Description adds substantial value by explaining the use_llm mode, recommended URL counts, extraction prompt structure, and templates. This goes beyond the schema's description fields.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: scraping 1-50 URLs with two modes (basic and AI-powered). It distinguishes itself from sibling tools like web_search and deep_research by focusing on extracting content from specific URLs rather than searching or deep research.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Extensive usage guidance is provided: recommendation of 3-5 URLs, token budget breakdown, extraction templates, and examples of good vs bad usage. However, it lacks explicit instructions on when not to use this tool compared to siblings (e.g., when web_search is more appropriate).

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/yigitkonur/research-powerpack-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server