rag_search_google
Search Google, rank results by RAG similarity, and return context for LLMs in markdown format with optional URLs.
Instructions
Search on Google for a given query using ddgs. Give back context to the LLM with a RAG-like similarity sort.
Args: query (str): The query to search for. num_results (int): Number of results to return. top_k (int): Use top "k" results for content. include_urls (bool): Whether to include URLs in the results. If True, the results will be a list of dictionaries with the following keys: - type: "text" - text: The content of the result - url: The URL of the result
Returns: Dict of strings containing best search based on input query. Formatted in markdown.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| query | Yes | ||
| num_results | No | ||
| top_k | No | ||
| include_urls | No |
Output Schema
| Name | Required | Description | Default |
|---|---|---|---|
No arguments | |||
Implementation Reference
- src/mcp_local_rag/main.py:75-110 (handler)Handler function for the rag_search_google tool. Uses ddgs library with Google backend to search, scores results by semantic similarity via text embeddings, fetches content from top URLs, and returns markdown-formatted content.
@mcp.tool() def rag_search_google(query: str, num_results:int=10, top_k:int=5, include_urls:bool=True) -> Dict: """ Search on Google for a given query using ddgs. Give back context to the LLM with a RAG-like similarity sort. Args: query (str): The query to search for. num_results (int): Number of results to return. top_k (int): Use top "k" results for content. include_urls (bool): Whether to include URLs in the results. If True, the results will be a list of dictionaries with the following keys: - type: "text" - text: The content of the result - url: The URL of the result Returns: Dict of strings containing best search based on input query. Formatted in markdown. """ # Import heavy dependencies only when tool is invoked from ddgs import DDGS from .utils.fetch import fetch_all_content from .utils.tools import sort_by_score ddgs = DDGS() results = ddgs.text(query, max_results=num_results, backend="google") scored_results = sort_by_score(add_score_to_dict(query, results)) top_results = scored_results[0:top_k] # fetch content using thread pool md_content = fetch_all_content(top_results, include_urls) # formatted as dict return { "content": md_content } - src/mcp_local_rag/main.py:75-75 (registration)Tool registration via the @mcp.tool() decorator on the rag_search_google function, using FastMCP.
@mcp.tool() - src/mcp_local_rag/main.py:7-23 (helper)Helper function that embeds the query and each result, computing cosine similarity scores for RAG relevance ranking.
def add_score_to_dict(query: str, results: List[Dict]) -> List[Dict]: """Add similarity scores to search results.""" # Import heavy dependencies only when needed (slow import!) from importlib.resources import files from mediapipe.tasks.python import text from .utils.fetch import fetch_embedder, get_path_str path = get_path_str(files('mcp_local_rag.embedder').joinpath('embedder.tflite')) embedder = fetch_embedder(path) query_embedding = embedder.embed(query) for i in results: i['score'] = text.TextEmbedder.cosine_similarity( embedder.embed(i['body']).embeddings[0], query_embedding.embeddings[0]) return results - Utility function that fetches content from result URLs in parallel using a thread pool, returning text content with optional URLs.
def fetch_all_content(results: List[Dict], include_urls:bool=True) -> List[str]: """Fetch content from all URLs using a thread pool.""" urls = [site['href'] for site in results if site.get('href')] # parallelize requests with ThreadPoolExecutor(max_workers=5) as executor: # submit fetch tasks to executor future_to_url = {executor.submit(fetch_content, url): url for url in urls} content_list = [] for future, url in future_to_url.items(): try: content = future.result() if content: result = { "type": "text", "text": content, } if include_urls: result["url"] = url content_list.append(result) except Exception as e: print(f"Request failed with exception: {e}") return content_list - Utility function that sorts search results by their cosine similarity score in descending order.
def sort_by_score(results: List[Dict]) -> List[Dict]: """Sort results by similarity score.""" return sorted(results, key=lambda x: x['score'], reverse=True)