Search and Scrape
search_and_scrapeSearch Google and retrieve content from top results in one call. Returns combined, deduplicated content with source attribution.
Instructions
Search Google AND retrieve content from top results in one call. Returns combined, deduplicated content with source attribution.
When to use:
Primary tool for answering questions that need web research
Need content from multiple sources combined
More efficient than calling google_search + scrape_page separately
When to use other tools instead:
google_search: When you only need URLs without content
scrape_page: When you already have a specific URL
Content size control:
max_length_per_source: Limit content per source (default: 50KB)
total_max_length: Limit total combined content (default: 300KB)
filter_by_query: Only include paragraphs containing query keywords
Caching: Search results cached for 30 minutes, scraped pages for 1 hour.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| query | Yes | Your research question or topic. Be specific for better results. Example: 'Python async best practices 2024' rather than just 'Python'. | |
| num_results | No | Number of sources to fetch (1-10). Default 3 is good for most queries. Use 5-8 for comprehensive research, 1-2 for quick factual lookups. | |
| include_sources | No | Include source URLs at the end for citation. Default true - recommended for transparency. | |
| deduplicate | No | Remove duplicate content across sources. Default true - recommended to reduce noise when sources quote each other. | |
| max_length_per_source | No | Maximum content length per source in characters. Default: 50KB. | |
| total_max_length | No | Maximum total combined content length. Default: 300KB. | |
| filter_by_query | No | Filter to only include paragraphs containing query keywords. Reduces noise but may exclude relevant context. |
Output Schema
| Name | Required | Description | Default |
|---|---|---|---|
| query | Yes | The search query that was executed | |
| sources | Yes | List of sources that were processed | |
| combinedContent | Yes | Combined and optionally deduplicated content from all sources | |
| summary | Yes | Summary statistics for the operation | |
| sizeMetadata | Yes | Size information for the combined content |