mcp-local-rag

Overview Schema Related Servers Score Discussions

deep_research

Aggregate search results from multiple backends, score by relevance, and return top content without duplicates for comprehensive research.

Instructions

Perform deep research across multiple search terms using specified search backends. This tool aggregates results from multiple searches across chosen engines, scores them by relevance, and returns the most relevant content with duplicates removed. Perfect for comprehensive research on a topic.

Available backends: bing, brave, duckduckgo, google, grokipedia, mojeek, yandex, yahoo, wikipedia

USAGE GUIDANCE FOR LLM:

Ask the user which backend(s) they prefer, OR
Choose appropriate backend(s) based on context:
- ["duckduckgo"] - Privacy-focused, general search
- ["google"] - Comprehensive results, best for technical queries
- ["duckduckgo", "google"] - Maximum coverage (default)
- ["wikipedia"] - Factual/encyclopedia content
- ["bing", "google"] - Balanced commercial engines
- Multiple backends for broader research coverage
For specific use cases, consider:
- deep_research_google() - shortcut for Google-only
- deep_research_ddgs() - shortcut for DuckDuckGo-only

Args: search_terms (List[str]): List of search terms to research. Provide multiple related search queries for comprehensive coverage. Example: ["machine learning fundamentals", "neural networks", "deep learning best practices"] backends (List[str] | None): List of search backends to use. Defaults to ["duckduckgo", "google"]. Can include: bing, brave, duckduckgo, google, grokipedia, mojeek, yandex, yahoo, wikipedia. If None, uses default. num_results_per_term (int): Number of results to fetch per search term per backend. top_k_per_term (int): Number of top scored results to keep per search term per backend. include_urls (bool): Whether to include URLs in the results.

Returns: Dict containing aggregated research results from all search terms and specified backends, with duplicates removed.

Input Schema

TableJSON Schema

Name	Required	Description	Default
`search_terms`	Yes
`backends`	No
`num_results_per_term`	No
`top_k_per_term`	No
`include_urls`	No

Output Schema

TableJSON Schema

Name	Required	Description	Default
No arguments

Implementation Reference

src/mcp_local_rag/main.py:179-235 (handler)

The `deep_research` MCP tool handler function. Decorated with @mcp.tool(), it accepts search_terms (List[str]), backends (List[str]|None defaulting to ['duckduckgo','google']), num_results_per_term, top_k_per_term, and include_urls. Delegates to _deep_research_internal.

@mcp.tool()
def deep_research(
    search_terms: List[str], 
    backends: List[str] | None = None,
    num_results_per_term: int = 10, 
    top_k_per_term: int = 3, 
    include_urls: bool = True
) -> Dict:
    """
    Perform deep research across multiple search terms using specified search backends.
    This tool aggregates results from multiple searches across chosen engines, scores them 
    by relevance, and returns the most relevant content with duplicates removed.
    Perfect for comprehensive research on a topic.
    
    Available backends: bing, brave, duckduckgo, google, grokipedia, mojeek, yandex, yahoo, wikipedia
    
    USAGE GUIDANCE FOR LLM:
    1. Ask the user which backend(s) they prefer, OR
    2. Choose appropriate backend(s) based on context:
       - ["duckduckgo"] - Privacy-focused, general search
       - ["google"] - Comprehensive results, best for technical queries
       - ["duckduckgo", "google"] - Maximum coverage (default)
       - ["wikipedia"] - Factual/encyclopedia content
       - ["bing", "google"] - Balanced commercial engines
       - Multiple backends for broader research coverage
    
    3. For specific use cases, consider:
       - deep_research_google() - shortcut for Google-only
       - deep_research_ddgs() - shortcut for DuckDuckGo-only
    
    Args:
        search_terms (List[str]): List of search terms to research. Provide multiple 
                                  related search queries for comprehensive coverage.
                                  Example: ["machine learning fundamentals", "neural networks", "deep learning best practices"]
        backends (List[str] | None): List of search backends to use. Defaults to ["duckduckgo", "google"].
                             Can include: bing, brave, duckduckgo, google, grokipedia, 
                             mojeek, yandex, yahoo, wikipedia. If None, uses default.
        num_results_per_term (int): Number of results to fetch per search term per backend.
        top_k_per_term (int): Number of top scored results to keep per search term per backend.
        include_urls (bool): Whether to include URLs in the results.
    
    Returns:
        Dict containing aggregated research results from all search terms and specified backends,
        with duplicates removed.
    """

    # safe default if none is given
    if backends is None:
        backends = ["duckduckgo", "google"]

    return _deep_research_internal(
        search_terms=search_terms,
        backends=backends,
        num_results_per_term=num_results_per_term,
        top_k_per_term=top_k_per_term,
        include_urls=include_urls
    )

src/mcp_local_rag/main.py:25-25 (registration)
Registration of the tool via @mcp.tool() decorator on the deep_research function at line 179. The FastMCP instance 'mcp' is created at line 5.
```
@mcp.tool()
```

src/mcp_local_rag/main.py:112-177 (helper)

The internal helper function `_deep_research_internal` that contains the actual tool logic: iterates over search terms and backends, fetches results via DDGS, scores them, removes duplicates, fetches full page content, and returns aggregated results.

def _deep_research_internal(search_terms:List[str], backends:List[str], num_results_per_term:int=5,top_k_per_term:int=3, include_urls:bool=True)->Dict:
    """
    Internal function to perform deep research across multiple search term with the given backend engine in ddgs.

    Args:
        search_terms (List[str]): List of search terms to perform deep research on.
        backends (List[str]): List of search backends to use. 
        num_results (int): Num of results to fetch per search term per engine.
        top_k (int): Number of top score to keep per search term per engine.
        include_urls (bool): whether to include urls in the results.

    Returns:
        Dict containing aggregated research results from all search terms and engines.
    """

    # lazy load
    from ddgs import DDGS
    from .utils.fetch import fetch_all_content
    from .utils.tools import sort_by_score

    ddgs = DDGS()
    all_results = []
    search_summary = {}
    
    # search each term on all specified backends
    for term in search_terms:
        search_summary[term] = {backend: 0 for backend in backends}
        
        for backend in backends:
            try:
                if backend == "duckduckgo":
                    results = ddgs.text(term, max_results=num_results_per_term)
                else:
                    results = ddgs.text(term, max_results=num_results_per_term, backend=backend)
                if results:
                    scored_results = sort_by_score(add_score_to_dict(term, results))
                    top_results = scored_results[0:top_k_per_term]
                    all_results.extend(top_results)
                    search_summary[term][backend] = len(top_results)
            except Exception as e:
                print(f"Error searching {backend} for '{term}': {e}")
    
    # remove duplicates and keep high scores
    seen_urls = {}
    unique_results = []
    for result in all_results:
        url = result.get('href', '')
        if url:
            # Keep the result with the highest score for duplicate URLs
            if url not in seen_urls or result.get('score', 0) > seen_urls[url].get('score', 0):
                if url in seen_urls:
                    # Replace lower scored duplicate
                    unique_results.remove(seen_urls[url])
                seen_urls[url] = result
                unique_results.append(result)
    
    # fetch content from final list of results
    md_content = fetch_all_content(unique_results, include_urls)
    
    return {
        "search_terms": search_terms,
        "backends": backends,
        "search_summary": search_summary,
        "total_unique_results": len(unique_results),
        "content": md_content
    }

src/mcp_local_rag/main.py:238-262 (handler)

Shortcut handler `deep_research_google` - wraps _deep_research_internal with backends=['google'].

@mcp.tool()
def deep_research_google(search_terms: List[str], num_results_per_term:int=10, top_k_per_term:int=3, include_urls:bool=True) -> Dict:
    """
    Perform deep research across multiple search terms using ONLY Google.
    Aggregates results from multiple Google searches, scores them by relevance,
    and returns the most relevant content with duplicates removed.
    
    Args:
        search_terms (List[str]): List of search terms to research. The LLM should provide 
                                  multiple related search queries for comprehensive coverage.
        num_results_per_term (int): Number of results to fetch per search term.
        top_k_per_term (int): Number of top scored results to keep per search term.
        include_urls (bool): Whether to include URLs in the results.
    
    Returns:
        Dict containing aggregated research results from all search terms (Google only),
        with duplicates removed.
    """
    return _deep_research_internal(
        search_terms=search_terms,
        backends=["google"],
        num_results_per_term=num_results_per_term,
        top_k_per_term=top_k_per_term,
        include_urls=include_urls
    )

src/mcp_local_rag/main.py:265-289 (handler)

Shortcut handler `deep_research_ddgs` - wraps _deep_research_internal with backends=['duckduckgo'].

@mcp.tool()
def deep_research_ddgs(search_terms: List[str], num_results_per_term:int=10, top_k_per_term:int=3, include_urls:bool=True) -> Dict:
    """
    Perform deep research across multiple search terms using ONLY DuckDuckGo.
    Aggregates results from multiple DuckDuckGo searches, scores them by relevance,
    and returns the most relevant content with duplicates removed.
    
    Args:
        search_terms (List[str]): List of search terms to research. The LLM should provide 
                                  multiple related search queries for comprehensive coverage.
        num_results_per_term (int): Number of results to fetch per search term.
        top_k_per_term (int): Number of top scored results to keep per search term.
        include_urls (bool): Whether to include URLs in the results.
    
    Returns:
        Dict containing aggregated research results from all search terms (DuckDuckGo only),
        with duplicates removed.
    """
    return _deep_research_internal(
        search_terms=search_terms,
        backends=["duckduckgo"],
        num_results_per_term=num_results_per_term,
        top_k_per_term=top_k_per_term,
        include_urls=include_urls
    )

Tool Definition Quality

A4.5/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries full burden. It describes the tool as performing searches, aggregating results, and returning content, which implies a read-only operation. However, it does not explicitly state non-destructive behavior or auth requirements, but the process is clear enough.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is quite long and includes a 'USAGE GUIDANCE' section that partially repeats parameter info. While structured with sections, it could be more concise. Every sentence adds value but the guidance could be integrated.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given 5 parameters, no annotations, and an output schema (though not detailed in description), the description covers use cases, backend selection, and parameter defaults. It mentions sibling shortcuts. The return value is only vaguely described, but the output schema likely fills gaps.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 0%, so description must explain parameters. It details all 5 parameters: search_terms with an example, backends with list and defaults, num_results_per_term, top_k_per_term, include_urls. It adds meaning beyond the schema by providing context and usage examples.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description starts with 'Perform deep research across multiple search terms using specified search backends,' clearly stating the verb and resource. It explains aggregation, scoring, and deduplication, distinguishing it from sibling shortcuts like deep_research_google.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description includes a dedicated 'USAGE GUIDANCE FOR LLM' section that tells when to ask the user or choose backends based on context. It lists alternatives and when to use shortcuts, providing explicit when-to-use and when-not-to-use guidance.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

Lightport: Open-Sourcing Glama's AI Gateway
By punkpeye on April 27, 2026.
open source
OpenAI
Tool Definition Quality Score (TDQS)
By punkpeye on April 3, 2026.
mcp
The Hackers Who Tracked My Sleep Cycle
By punkpeye on March 26, 2026.
security

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/nkapila6/mcp-local-rag'

If you have feedback or need assistance with the MCP directory API, please join our Discord server