deep_research
Aggregate search results from multiple backends, score by relevance, and return top content without duplicates for comprehensive research.
Instructions
Perform deep research across multiple search terms using specified search backends. This tool aggregates results from multiple searches across chosen engines, scores them by relevance, and returns the most relevant content with duplicates removed. Perfect for comprehensive research on a topic.
Available backends: bing, brave, duckduckgo, google, grokipedia, mojeek, yandex, yahoo, wikipedia
USAGE GUIDANCE FOR LLM:
Ask the user which backend(s) they prefer, OR
Choose appropriate backend(s) based on context:
["duckduckgo"] - Privacy-focused, general search
["google"] - Comprehensive results, best for technical queries
["duckduckgo", "google"] - Maximum coverage (default)
["wikipedia"] - Factual/encyclopedia content
["bing", "google"] - Balanced commercial engines
Multiple backends for broader research coverage
For specific use cases, consider:
deep_research_google() - shortcut for Google-only
deep_research_ddgs() - shortcut for DuckDuckGo-only
Args: search_terms (List[str]): List of search terms to research. Provide multiple related search queries for comprehensive coverage. Example: ["machine learning fundamentals", "neural networks", "deep learning best practices"] backends (List[str] | None): List of search backends to use. Defaults to ["duckduckgo", "google"]. Can include: bing, brave, duckduckgo, google, grokipedia, mojeek, yandex, yahoo, wikipedia. If None, uses default. num_results_per_term (int): Number of results to fetch per search term per backend. top_k_per_term (int): Number of top scored results to keep per search term per backend. include_urls (bool): Whether to include URLs in the results.
Returns: Dict containing aggregated research results from all search terms and specified backends, with duplicates removed.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| search_terms | Yes | ||
| backends | No | ||
| num_results_per_term | No | ||
| top_k_per_term | No | ||
| include_urls | No |
Output Schema
| Name | Required | Description | Default |
|---|---|---|---|
No arguments | |||
Implementation Reference
- src/mcp_local_rag/main.py:179-235 (handler)The `deep_research` MCP tool handler function. Decorated with @mcp.tool(), it accepts search_terms (List[str]), backends (List[str]|None defaulting to ['duckduckgo','google']), num_results_per_term, top_k_per_term, and include_urls. Delegates to _deep_research_internal.
@mcp.tool() def deep_research( search_terms: List[str], backends: List[str] | None = None, num_results_per_term: int = 10, top_k_per_term: int = 3, include_urls: bool = True ) -> Dict: """ Perform deep research across multiple search terms using specified search backends. This tool aggregates results from multiple searches across chosen engines, scores them by relevance, and returns the most relevant content with duplicates removed. Perfect for comprehensive research on a topic. Available backends: bing, brave, duckduckgo, google, grokipedia, mojeek, yandex, yahoo, wikipedia USAGE GUIDANCE FOR LLM: 1. Ask the user which backend(s) they prefer, OR 2. Choose appropriate backend(s) based on context: - ["duckduckgo"] - Privacy-focused, general search - ["google"] - Comprehensive results, best for technical queries - ["duckduckgo", "google"] - Maximum coverage (default) - ["wikipedia"] - Factual/encyclopedia content - ["bing", "google"] - Balanced commercial engines - Multiple backends for broader research coverage 3. For specific use cases, consider: - deep_research_google() - shortcut for Google-only - deep_research_ddgs() - shortcut for DuckDuckGo-only Args: search_terms (List[str]): List of search terms to research. Provide multiple related search queries for comprehensive coverage. Example: ["machine learning fundamentals", "neural networks", "deep learning best practices"] backends (List[str] | None): List of search backends to use. Defaults to ["duckduckgo", "google"]. Can include: bing, brave, duckduckgo, google, grokipedia, mojeek, yandex, yahoo, wikipedia. If None, uses default. num_results_per_term (int): Number of results to fetch per search term per backend. top_k_per_term (int): Number of top scored results to keep per search term per backend. include_urls (bool): Whether to include URLs in the results. Returns: Dict containing aggregated research results from all search terms and specified backends, with duplicates removed. """ # safe default if none is given if backends is None: backends = ["duckduckgo", "google"] return _deep_research_internal( search_terms=search_terms, backends=backends, num_results_per_term=num_results_per_term, top_k_per_term=top_k_per_term, include_urls=include_urls ) - src/mcp_local_rag/main.py:25-25 (registration)Registration of the tool via @mcp.tool() decorator on the deep_research function at line 179. The FastMCP instance 'mcp' is created at line 5.
@mcp.tool() - src/mcp_local_rag/main.py:112-177 (helper)The internal helper function `_deep_research_internal` that contains the actual tool logic: iterates over search terms and backends, fetches results via DDGS, scores them, removes duplicates, fetches full page content, and returns aggregated results.
def _deep_research_internal(search_terms:List[str], backends:List[str], num_results_per_term:int=5,top_k_per_term:int=3, include_urls:bool=True)->Dict: """ Internal function to perform deep research across multiple search term with the given backend engine in ddgs. Args: search_terms (List[str]): List of search terms to perform deep research on. backends (List[str]): List of search backends to use. num_results (int): Num of results to fetch per search term per engine. top_k (int): Number of top score to keep per search term per engine. include_urls (bool): whether to include urls in the results. Returns: Dict containing aggregated research results from all search terms and engines. """ # lazy load from ddgs import DDGS from .utils.fetch import fetch_all_content from .utils.tools import sort_by_score ddgs = DDGS() all_results = [] search_summary = {} # search each term on all specified backends for term in search_terms: search_summary[term] = {backend: 0 for backend in backends} for backend in backends: try: if backend == "duckduckgo": results = ddgs.text(term, max_results=num_results_per_term) else: results = ddgs.text(term, max_results=num_results_per_term, backend=backend) if results: scored_results = sort_by_score(add_score_to_dict(term, results)) top_results = scored_results[0:top_k_per_term] all_results.extend(top_results) search_summary[term][backend] = len(top_results) except Exception as e: print(f"Error searching {backend} for '{term}': {e}") # remove duplicates and keep high scores seen_urls = {} unique_results = [] for result in all_results: url = result.get('href', '') if url: # Keep the result with the highest score for duplicate URLs if url not in seen_urls or result.get('score', 0) > seen_urls[url].get('score', 0): if url in seen_urls: # Replace lower scored duplicate unique_results.remove(seen_urls[url]) seen_urls[url] = result unique_results.append(result) # fetch content from final list of results md_content = fetch_all_content(unique_results, include_urls) return { "search_terms": search_terms, "backends": backends, "search_summary": search_summary, "total_unique_results": len(unique_results), "content": md_content } - src/mcp_local_rag/main.py:238-262 (handler)Shortcut handler `deep_research_google` - wraps _deep_research_internal with backends=['google'].
@mcp.tool() def deep_research_google(search_terms: List[str], num_results_per_term:int=10, top_k_per_term:int=3, include_urls:bool=True) -> Dict: """ Perform deep research across multiple search terms using ONLY Google. Aggregates results from multiple Google searches, scores them by relevance, and returns the most relevant content with duplicates removed. Args: search_terms (List[str]): List of search terms to research. The LLM should provide multiple related search queries for comprehensive coverage. num_results_per_term (int): Number of results to fetch per search term. top_k_per_term (int): Number of top scored results to keep per search term. include_urls (bool): Whether to include URLs in the results. Returns: Dict containing aggregated research results from all search terms (Google only), with duplicates removed. """ return _deep_research_internal( search_terms=search_terms, backends=["google"], num_results_per_term=num_results_per_term, top_k_per_term=top_k_per_term, include_urls=include_urls ) - src/mcp_local_rag/main.py:265-289 (handler)Shortcut handler `deep_research_ddgs` - wraps _deep_research_internal with backends=['duckduckgo'].
@mcp.tool() def deep_research_ddgs(search_terms: List[str], num_results_per_term:int=10, top_k_per_term:int=3, include_urls:bool=True) -> Dict: """ Perform deep research across multiple search terms using ONLY DuckDuckGo. Aggregates results from multiple DuckDuckGo searches, scores them by relevance, and returns the most relevant content with duplicates removed. Args: search_terms (List[str]): List of search terms to research. The LLM should provide multiple related search queries for comprehensive coverage. num_results_per_term (int): Number of results to fetch per search term. top_k_per_term (int): Number of top scored results to keep per search term. include_urls (bool): Whether to include URLs in the results. Returns: Dict containing aggregated research results from all search terms (DuckDuckGo only), with duplicates removed. """ return _deep_research_internal( search_terms=search_terms, backends=["duckduckgo"], num_results_per_term=num_results_per_term, top_k_per_term=top_k_per_term, include_urls=include_urls )