rag
Answer questions by retrieving relevant information from knowledge bases and generating responses with customizable search modes and language models.
Instructions
Perform Retrieval-Augmented Generation (RAG) query with full parameter control.
This tool retrieves relevant context from the knowledge base and generates an answer using a language model. Supports all search modes (semantic, hybrid, graph) and customizable generation parameters.
Args: query: The question to answer using the knowledge base. Required. preset: Preset configuration for common use cases. Options: - "default": Basic RAG with gpt-4o-mini, temperature 0.7, 10 results - "development": Hybrid search with higher temperature for creative answers, 15 results - "refactoring": Hybrid + graph search with gpt-4o for code analysis, 20 results - "debug": Minimal graph search with low temperature for precise answers, 5 results - "research": Comprehensive search with gpt-4o for research questions, 30 results - "production": Balanced hybrid search optimized for production, 10 results model: LLM model to use for generation. Examples: - "vertex_ai/gemini-2.5-flash" (default, fast and cost-effective) - "vertex_ai/gemini-2.5-pro" (more capable, higher cost) - "openai/gpt-4-turbo" (high performance) - "anthropic/claude-3-haiku-20240307" (fast) - "anthropic/claude-3-sonnet-20240229" (balanced) - "anthropic/claude-3-opus-20240229" (most capable) temperature: Generation temperature controlling randomness. Must be between 0.0 and 1.0. Lower values (0.0-0.3) = more deterministic, precise answers Medium values (0.4-0.7) = balanced creativity and accuracy (default: 0.7) Higher values (0.8-1.0) = more creative, diverse answers max_tokens: Maximum number of tokens to generate. Optional, uses model default if not specified. use_semantic_search: Enable semantic/vector search for retrieval (default: True) use_hybrid_search: Enable hybrid search combining semantic and full-text search (default: False) use_graph_search: Enable knowledge graph search for entity/relationship context (default: False) limit: Maximum number of search results to retrieve. Must be between 1 and 100 (default: 10) kg_search_type: Knowledge graph search type. "local" for local context, "global" for broader connections (default: "local") semantic_weight: Weight for semantic search in hybrid mode. Must be between 0.0 and 10.0 (default: 5.0) full_text_weight: Weight for full-text search in hybrid mode. Must be between 0.0 and 10.0 (default: 1.0) full_text_limit: Maximum full-text results to consider in hybrid search. Must be between 1 and 1000 (default: 200) rrf_k: Reciprocal Rank Fusion parameter for hybrid search. Must be between 1 and 100 (default: 50) search_strategy: Advanced search strategy (e.g., "hyde", "rag_fusion"). Optional. include_web_search: Include web search results from the internet (default: False) task_prompt_override: Custom system prompt to override the default RAG task prompt. Useful for specializing AI behavior for specific domains or tasks. Optional.
Returns: Generated answer based on relevant context from the knowledge base.
Examples: # Simple RAG query rag("What is machine learning?")
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| query | Yes | ||
| preset | No | default | |
| model | No | vertex_ai/gemini-2.5-pro | |
| temperature | No | ||
| max_tokens | No | ||
| use_semantic_search | No | ||
| use_hybrid_search | No | ||
| use_graph_search | No | ||
| limit | No | ||
| kg_search_type | No | global | |
| semantic_weight | No | ||
| full_text_weight | No | ||
| full_text_limit | No | ||
| rrf_k | No | ||
| search_strategy | No | ||
| include_web_search | No | ||
| task_prompt_override | No |
Implementation Reference
- server.py:639-894 (handler)The core handler function 'rag' decorated with @mcp.tool(), implementing Retrieval-Augmented Generation: retrieves relevant context from R2R knowledge base and generates LLM response.@mcp.tool( annotations={ "title": "R2R RAG", "readOnlyHint": False, "destructiveHint": False, "openWorldHint": True, } ) async def rag( query: str, ctx: Context, preset: str = "default", model: str = "vertex_ai/gemini-2.5-pro", temperature: float = 0.7, max_tokens: int | None = 8000, use_semantic_search: bool = True, use_hybrid_search: bool = False, use_graph_search: bool = True, limit: int = 100, kg_search_type: Literal["local", "global"] = "global", semantic_weight: float = 5.0, full_text_weight: float = 1.0, full_text_limit: int = 200, rrf_k: int = 50, search_strategy: str | None = None, include_web_search: bool = False, task_prompt_override: str | None = None, ) -> str: """ Perform Retrieval-Augmented Generation (RAG) query with full parameter control. This tool retrieves relevant context from the knowledge base and generates an answer using a language model. Supports all search modes (semantic, hybrid, graph) and customizable generation parameters. Args: query: The question to answer using the knowledge base. Required. preset: Preset configuration for common use cases. Options: - "default": Basic RAG with gpt-4o-mini, temperature 0.7, 10 results - "development": Hybrid search with higher temperature for creative answers, 15 results - "refactoring": Hybrid + graph search with gpt-4o for code analysis, 20 results - "debug": Minimal graph search with low temperature for precise answers, 5 results - "research": Comprehensive search with gpt-4o for research questions, 30 results - "production": Balanced hybrid search optimized for production, 10 results model: LLM model to use for generation. Examples: - "vertex_ai/gemini-2.5-flash" (default, fast and cost-effective) - "vertex_ai/gemini-2.5-pro" (more capable, higher cost) - "openai/gpt-4-turbo" (high performance) - "anthropic/claude-3-haiku-20240307" (fast) - "anthropic/claude-3-sonnet-20240229" (balanced) - "anthropic/claude-3-opus-20240229" (most capable) temperature: Generation temperature controlling randomness. Must be between 0.0 and 1.0. Lower values (0.0-0.3) = more deterministic, precise answers Medium values (0.4-0.7) = balanced creativity and accuracy (default: 0.7) Higher values (0.8-1.0) = more creative, diverse answers max_tokens: Maximum number of tokens to generate. Optional, uses model default if not specified. use_semantic_search: Enable semantic/vector search for retrieval (default: True) use_hybrid_search: Enable hybrid search combining semantic and full-text search (default: False) use_graph_search: Enable knowledge graph search for entity/relationship context (default: False) limit: Maximum number of search results to retrieve. Must be between 1 and 100 (default: 10) kg_search_type: Knowledge graph search type. "local" for local context, "global" for broader connections (default: "local") semantic_weight: Weight for semantic search in hybrid mode. Must be between 0.0 and 10.0 (default: 5.0) full_text_weight: Weight for full-text search in hybrid mode. Must be between 0.0 and 10.0 (default: 1.0) full_text_limit: Maximum full-text results to consider in hybrid search. Must be between 1 and 1000 (default: 200) rrf_k: Reciprocal Rank Fusion parameter for hybrid search. Must be between 1 and 100 (default: 50) search_strategy: Advanced search strategy (e.g., "hyde", "rag_fusion"). Optional. include_web_search: Include web search results from the internet (default: False) task_prompt_override: Custom system prompt to override the default RAG task prompt. Useful for specializing AI behavior for specific domains or tasks. Optional. Returns: Generated answer based on relevant context from the knowledge base. Examples: # Simple RAG query rag("What is machine learning?") # Development preset for code questions rag("How to implement async/await in Python?", preset="development") # Custom RAG with specific model and temperature rag( "Explain neural networks", model="vertex_ai/gemini-2.5-pro", temperature=0.5 ) # Research preset with comprehensive search rag( "Latest developments in transformer architectures", preset="research" ) # Debug preset for precise technical answers rag("What causes this error?", preset="debug") """ await ctx.info(f"RAG query: {query}, preset: {preset}, model: {model}") try: # Validate parameters validate_limit(limit) validate_temperature(temperature) validate_semantic_weight(semantic_weight) validate_full_text_weight(full_text_weight) validate_full_text_limit(full_text_limit) validate_rrf_k(rrf_k) if use_graph_search: validate_kg_search_type(kg_search_type) await ctx.report_progress(progress=10, total=100, message="Initializing RAG") client = R2RClient(base_url=R2R_BASE_URL) if API_KEY: client.set_api_key(API_KEY) # Get preset configuration and merge with explicit parameters preset_config = get_rag_preset_config(preset) # Apply preset values, but allow explicit parameters to override final_use_hybrid = ( use_hybrid_search if preset == "default" else ( use_hybrid_search or preset_config["search_settings"].get("use_hybrid_search", False) ) ) final_use_graph = ( use_graph_search if preset == "default" else ( use_graph_search or preset_config["search_settings"].get("use_graph_search", False) ) ) search_settings: dict[str, Any] = { "use_semantic_search": use_semantic_search, "limit": limit, } # Apply hybrid search settings if final_use_hybrid: search_settings["use_hybrid_search"] = True hybrid_config = preset_config["search_settings"].get("hybrid_settings", {}) search_settings["hybrid_settings"] = { "semantic_weight": semantic_weight if semantic_weight != 5.0 or preset == "default" else hybrid_config.get("semantic_weight", 5.0), "full_text_weight": full_text_weight if full_text_weight != 1.0 or preset == "default" else hybrid_config.get("full_text_weight", 1.0), "full_text_limit": full_text_limit if full_text_limit != 200 or preset == "default" else hybrid_config.get("full_text_limit", 200), "rrf_k": rrf_k if rrf_k != 50 or preset == "default" else hybrid_config.get("rrf_k", 50), } await ctx.info("Hybrid search enabled for RAG") # Apply graph search settings if final_use_graph: kg_type = ( kg_search_type if kg_search_type != "local" or preset == "default" else preset_config["search_settings"].get("kg_search_type", "local") ) search_settings["graph_search_settings"] = { "use_graph_search": True, "kg_search_type": kg_type, } await ctx.info(f"Knowledge graph search enabled (type: {kg_type})") # Apply search strategy if provided if search_strategy: search_settings["search_strategy"] = search_strategy await ctx.info(f"Search strategy: {search_strategy}") # Build RAG generation config rag_model = ( model if model != "vertex_ai/gemini-2.5-flash" else preset_config["rag_generation_config"].get( "model", "vertex_ai/gemini-2.5-flash" ) ) rag_temp = ( temperature if temperature != 0.7 else preset_config["rag_generation_config"].get("temperature", 0.7) ) rag_generation_config: dict[str, Any] = { "model": rag_model, "temperature": rag_temp, "stream": False, } if max_tokens is not None: rag_generation_config["max_tokens"] = max_tokens await ctx.report_progress(progress=30, total=100, message="Retrieving context") try: rag_kwargs: dict[str, Any] = { "query": query, "search_settings": search_settings if search_settings else None, "rag_generation_config": rag_generation_config, "include_web_search": include_web_search, } if task_prompt_override: rag_kwargs["task_prompt"] = task_prompt_override rag_response = client.retrieval.rag(**rag_kwargs) await ctx.report_progress( progress=90, total=100, message="Generating answer" ) answer = rag_response.results.generated_answer # type: ignore await ctx.report_progress(progress=100, total=100, message="Complete") await ctx.info("RAG completed successfully") return answer except Exception as e: await ctx.error(f"RAG generation failed: {e!s}") raise except ValueError as e: await ctx.error(f"Validation error: {e!s}") raise except Exception as e: await ctx.error(f"RAG failed: {e!s}") raise
- server.py:639-646 (registration)FastMCP tool registration for the 'rag' tool using @mcp.tool() decorator with title 'R2R RAG'.@mcp.tool( annotations={ "title": "R2R RAG", "readOnlyHint": False, "destructiveHint": False, "openWorldHint": True, } )
- server.py:647-756 (schema)Input schema defined by function parameters with type annotations, default values, and comprehensive docstring specifying validation rules and examples.async def rag( query: str, ctx: Context, preset: str = "default", model: str = "vertex_ai/gemini-2.5-pro", temperature: float = 0.7, max_tokens: int | None = 8000, use_semantic_search: bool = True, use_hybrid_search: bool = False, use_graph_search: bool = True, limit: int = 100, kg_search_type: Literal["local", "global"] = "global", semantic_weight: float = 5.0, full_text_weight: float = 1.0, full_text_limit: int = 200, rrf_k: int = 50, search_strategy: str | None = None, include_web_search: bool = False, task_prompt_override: str | None = None, ) -> str: """ Perform Retrieval-Augmented Generation (RAG) query with full parameter control. This tool retrieves relevant context from the knowledge base and generates an answer using a language model. Supports all search modes (semantic, hybrid, graph) and customizable generation parameters. Args: query: The question to answer using the knowledge base. Required. preset: Preset configuration for common use cases. Options: - "default": Basic RAG with gpt-4o-mini, temperature 0.7, 10 results - "development": Hybrid search with higher temperature for creative answers, 15 results - "refactoring": Hybrid + graph search with gpt-4o for code analysis, 20 results - "debug": Minimal graph search with low temperature for precise answers, 5 results - "research": Comprehensive search with gpt-4o for research questions, 30 results - "production": Balanced hybrid search optimized for production, 10 results model: LLM model to use for generation. Examples: - "vertex_ai/gemini-2.5-flash" (default, fast and cost-effective) - "vertex_ai/gemini-2.5-pro" (more capable, higher cost) - "openai/gpt-4-turbo" (high performance) - "anthropic/claude-3-haiku-20240307" (fast) - "anthropic/claude-3-sonnet-20240229" (balanced) - "anthropic/claude-3-opus-20240229" (most capable) temperature: Generation temperature controlling randomness. Must be between 0.0 and 1.0. Lower values (0.0-0.3) = more deterministic, precise answers Medium values (0.4-0.7) = balanced creativity and accuracy (default: 0.7) Higher values (0.8-1.0) = more creative, diverse answers max_tokens: Maximum number of tokens to generate. Optional, uses model default if not specified. use_semantic_search: Enable semantic/vector search for retrieval (default: True) use_hybrid_search: Enable hybrid search combining semantic and full-text search (default: False) use_graph_search: Enable knowledge graph search for entity/relationship context (default: False) limit: Maximum number of search results to retrieve. Must be between 1 and 100 (default: 10) kg_search_type: Knowledge graph search type. "local" for local context, "global" for broader connections (default: "local") semantic_weight: Weight for semantic search in hybrid mode. Must be between 0.0 and 10.0 (default: 5.0) full_text_weight: Weight for full-text search in hybrid mode. Must be between 0.0 and 10.0 (default: 1.0) full_text_limit: Maximum full-text results to consider in hybrid search. Must be between 1 and 1000 (default: 200) rrf_k: Reciprocal Rank Fusion parameter for hybrid search. Must be between 1 and 100 (default: 50) search_strategy: Advanced search strategy (e.g., "hyde", "rag_fusion"). Optional. include_web_search: Include web search results from the internet (default: False) task_prompt_override: Custom system prompt to override the default RAG task prompt. Useful for specializing AI behavior for specific domains or tasks. Optional. Returns: Generated answer based on relevant context from the knowledge base. Examples: # Simple RAG query rag("What is machine learning?") # Development preset for code questions rag("How to implement async/await in Python?", preset="development") # Custom RAG with specific model and temperature rag( "Explain neural networks", model="vertex_ai/gemini-2.5-pro", temperature=0.5 ) # Research preset with comprehensive search rag( "Latest developments in transformer architectures", preset="research" ) # Debug preset for precise technical answers rag("What causes this error?", preset="debug") """
- server.py:113-121 (helper)Enum defining RAG presets used by the rag tool for common configurations.class RAGPreset(str, Enum): """Preset configurations for RAG operations.""" DEFAULT = "default" DEVELOPMENT = "development" REFACTORING = "refactoring" DEBUG = "debug" RESEARCH = "research" PRODUCTION = "production"
- server.py:203-316 (helper)Helper function to retrieve RAG preset configurations used in the rag tool for search and generation settings.def get_rag_preset_config(preset: str) -> dict[str, Any]: """ Get RAG configuration for a preset. Args: preset: Preset name (default, development, refactoring, debug, research, production) Returns: Dictionary with RAG settings (search_settings and rag_generation_config) """ presets = { "default": { "search_settings": { "use_semantic_search": True, "use_hybrid_search": False, "limit": 10, }, "rag_generation_config": { "model": "vertex_ai/gemini-2.5-flash", "temperature": 0.7, }, }, "development": { "search_settings": { "use_semantic_search": True, "use_hybrid_search": True, "limit": 15, "hybrid_settings": { "semantic_weight": 5.0, "full_text_weight": 1.0, "full_text_limit": 200, "rrf_k": 50, }, }, "rag_generation_config": { "model": "vertex_ai/gemini-2.5-flash", "temperature": 0.8, }, }, "refactoring": { "search_settings": { "use_semantic_search": True, "use_hybrid_search": True, "use_graph_search": True, "limit": 20, "kg_search_type": "local", "hybrid_settings": { "semantic_weight": 7.0, "full_text_weight": 3.0, "full_text_limit": 300, "rrf_k": 50, }, }, "rag_generation_config": { "model": "vertex_ai/gemini-2.5-pro", "temperature": 0.5, }, }, "debug": { "search_settings": { "use_semantic_search": True, "use_hybrid_search": False, "use_graph_search": True, "limit": 5, "kg_search_type": "local", }, "rag_generation_config": { "model": "vertex_ai/gemini-2.5-flash", "temperature": 0.3, }, }, "research": { "search_settings": { "use_semantic_search": True, "use_hybrid_search": True, "use_graph_search": True, "limit": 30, "kg_search_type": "global", "hybrid_settings": { "semantic_weight": 6.0, "full_text_weight": 2.0, "full_text_limit": 400, "rrf_k": 60, }, }, "rag_generation_config": { "model": "vertex_ai/gemini-2.5-pro", "temperature": 0.7, }, }, "production": { "search_settings": { "use_semantic_search": True, "use_hybrid_search": True, "limit": 10, "hybrid_settings": { "semantic_weight": 5.0, "full_text_weight": 1.0, "full_text_limit": 200, "rrf_k": 50, }, }, "rag_generation_config": { "model": "vertex_ai/gemini-2.5-flash", "temperature": 0.6, }, }, } config = presets.get(preset.lower(), presets["default"]) return { "search_settings": config["search_settings"].copy(), "rag_generation_config": config["rag_generation_config"].copy(), }