# Research Powerpack MCP - Tool Configuration
# Optimized for LLM consumption - <1K tokens total
version: "1.0"
metadata:
name: "research-powerpack-mcp"
description: "Parallel research tools with AI extraction"
# CORE PRINCIPLES (ALL TOOLS)
shared:
philosophy: "MAXIMIZE inputs for parallel processing - NO time penalty for more items! More diverse inputs = better coverage = higher quality. Each input MUST target DIFFERENT angle - NO overlap/duplicates. ALWAYS use sequentialthinking between tool calls."
workflow: |
MANDATORY: THINK first (plan strategy) → EXECUTE with MAX diversity (use recommended counts) → THINK after (evaluate: what's missing? new angles discovered? gaps in coverage?) → ITERATE if needed (refine based on findings) → SYNTHESIZE final output.
KEY INSIGHT: First research pass usually incomplete - reveals what you SHOULD have asked! Results are feedback - use them to ask better questions next iteration.
iteration_triggers: "Iterate when: Results mention concepts you didn't research | Answers raise NEW questions | Initial scope too narrow | Related topics discovered you should explore"
tools:
- name: search_reddit
category: reddit
capability: search
limits: {min_queries: 10, max_queries: 50, recommended: 20}
description: |
MIN 10 queries, REC 20+ (10q=100 results, 20q=200 results). Each query MUST target DIFFERENT angle - NO overlap! 10 categories: 1) direct topic 2) recommendations (best/top) 3) specific tools/repos/names 4) comparisons (vs) 5) alternatives 6) subreddit targeting (r/macapps, r/opensource) 7) problems/issues/crashes 8) year-specific (2024/2025) 9) features 10) dev/GitHub/electron. Operators: intitle:, "exact", OR, -exclude. Using 1-3 queries wastes parallel power. Auto adds site:reddit.com.
parameters:
queries:
type: array
required: true
items: {type: string}
validation: {minItems: 10, maxItems: 50}
description: "10-50 diverse queries. Each targets different angle: direct|best|tools|vs|alternative|r/sub|issues|2024|features|GitHub. NO overlap."
date_after:
type: string
required: false
description: "Filter after date (YYYY-MM-DD)"
- name: get_reddit_post
category: reddit
capability: reddit
limits: {min_urls: 2, max_urls: 50, recommended: 20, default_max_comments: 1000}
extraction_suffix: "Extract key insights: consensus, recommendations with reasoning, contrasting views, real experiences, technical details. Be comprehensive + concise, prioritize actionable info."
description: |
MIN 2 URLs, REC 10-20+. Auto comment budget: 1000 total (2 posts=500 each deep, 10=100 balanced, 20=50 broad coverage). Comments have BEST insights - always fetch_comments=true unless only need titles. use_llm=true for AI extraction (pennies, needs OPENROUTER_API_KEY) - synthesizes consensus, filters noise. Mix subreddits for diverse perspectives. Using 2-5 posts = narrow, missing consensus. 20-30 posts = comprehensive.
parameters:
urls:
type: array
required: true
items: {type: string}
validation: {minItems: 2, maxItems: 50}
description: "2-50 Reddit URLs. More = broader consensus. Get from search_reddit."
fetch_comments:
type: boolean
required: false
default: true
description: "Fetch comments (true recommended - best insights in comments)"
max_comments:
type: number
required: false
default: 100
description: "Override auto allocation. Leave empty for smart allocation."
use_llm:
type: boolean
required: false
default: false
description: "AI extraction (recommended). Extracts insights, consensus, filters noise. Needs OPENROUTER_API_KEY."
what_to_extract:
type: string
required: false
description: "Extraction instructions for AI. Be specific: 'Extract recommendations for X with pros/cons'. More specific = better quality."
- name: deep_research
category: research
capability: deepResearch
useZodSchema: true
zodSchemaRef: "deepResearchParamsSchema"
limits: {min_questions: 1, max_questions: 10, recommended: 5, min_length: 200, min_specific: 2}
description: |
MIN 2, REC 5-10 questions parallel. 32K tokens distributed (2q=16K/each deep, 5q=6.4K balanced, 10q=3.2K comprehensive). MANDATORY template per question:
"🎯 WHAT I NEED: [clear goal]
🤔 WHY: [decision/problem]
📚 WHAT I KNOW: [current understanding - so research fills gaps not basics]
🔧 HOW I'LL USE: [implementation/debugging/architecture]
❓ SPECIFIC QUESTIONS: 1) [q1] 2) [q2] 3) [q3]
🌐 PRIORITY SOURCES: [optional - docs/sites]
⚡ FOCUS: [optional - performance/security]"
ATTACH FILES for code Qs (bugs/perf/refactor/review/architecture) - MANDATORY or research is generic/unhelpful. Using 1-2 Qs wastes parallel capacity. First pass incomplete - iterate based on findings.
schemaDescriptions:
questions: "2-10 structured questions following template above. Each MUST cover different angle of topic. Attach files for ANY code-related Q (bugs/errors/perf/refactoring/review/architecture) - without code context, research is generic/useless."
file_attachments: "MANDATORY for code Qs: bugs→failing code, perf→slow code, refactor→current implementation, review→code to review, architecture→relevant modules. Format: {path: '/absolute/path', description: 'What file is, why relevant, focus areas, known issues', start_line?, end_line?}. Thorough description critical."
- name: scrape_links
category: scrape
capability: scraping
useZodSchema: true
zodSchemaRef: "scrapeLinksParamsSchema"
limits: {min_urls: 1, max_urls: 50, recommended: 5, min_extract_len: 50, min_targets: 3}
extraction_suffix: "MUST: Comprehensive extraction, high info density, no verbosity, satisfy scope, NEVER hallucinate, only extract grounded info from document. No confirmation messages."
description: |
REC 3-5 URLs. ALWAYS use_llm=true - AI extraction is this tool's superpower! use_llm=false gives raw HTML (messy, hours manual parsing). use_llm=true: auto-filters nav/ads/footers, extracts ONLY what you specify, handles complex structures, returns clean ready-to-use content, costs pennies (~$0.01/10 pages) vs hours work. 32K tokens (3=10K each, 5=6K, 10=3K, 50=640). Formula: "Extract [t1] | [t2] | [t3] | [t4] | [t5] with focus on [a1], [a2], [a3]". Min 3 targets with | separator, be specific (pricing tiers not pricing), aim 5-10 targets. Templates: Product (pricing|features|reviews|specs|integrations|support), Tech Docs (endpoints|auth|limits|errors|examples|schemas), Competitive (features|pricing|customers|USPs|stack|testimonials).
schemaDescriptions:
urls: "1-50 URLs (3-5 recommended). More URLs = broader coverage, fewer tokens/URL."
timeout: "Timeout per URL (5-120s, default 30)"
use_llm: "ALWAYS SET TRUE (default false but override it). AI extraction is tool's superpower - auto-filters nav/ads/footers, extracts ONLY your specified targets, returns clean structured content ready to use. Raw HTML (use_llm=false) is messy noise requiring hours parsing. Cost: ~$0.001/page (pennies). Needs OPENROUTER_API_KEY."
what_to_extract: "REQUIRED when use_llm=true. Formula: 'Extract [target1] | [target2] | [target3] | [target4] | [target5] with focus on [aspect1], [aspect2], [aspect3]'. Min 50 chars, min 3 targets with | separator. Be super specific (pricing tiers not pricing, API rate limits not API info). Aim 5-10 targets. Act like detailed LLM instruction. More specific targets = higher extraction quality. Include 'with focus on' to prioritize aspects."
- name: web_search
category: search
capability: search
useZodSchema: true
zodSchemaRef: "webSearchParamsSchema"
limits: {min_keywords: 3, max_keywords: 100, recommended: 7}
description: |
MIN 3, REC 5-7 keywords parallel Google search. 10 results/keyword (3=30 results, 7=70, 100=1000). Each keyword = separate search - MUST be diverse angles! 7 perspectives: 1) broad [topic] 2) specific/technical [topic]+[term] 3) problems [topic] issues/debugging 4) best practices [topic] 2024/2025 5) comparison [A] vs [B] OR 6) tutorial/guide 7) advanced patterns/architecture. Operators: site:domain.com (target GitHub/StackOverflow/docs), "exact phrase", -exclude, filetype:pdf, OR. Examples: "PostgreSQL" site:github.com stars:>1000, "Docker OOM" site:stackoverflow.com. CRITICAL: Search only gives URLs - MUST follow with scrape_links to get actual content! Workflow: search → think → scrape_links (MANDATORY) → think → iterate → synthesize. Using 1-2 keywords wastes parallel power.
schemaDescriptions:
keywords: "3-100 diverse keywords (5-7 rec). Each separate Google search parallel. 7 angles: broad | technical | problems | best practices+year | vs/OR | tutorial | patterns. Operators: site:, \"exact\", -exclude, filetype:, OR."