Skip to main content
Glama
evgenygurin

R2R FastMCP Server

by evgenygurin

rag

Answer questions by retrieving relevant information from knowledge bases and generating responses with customizable search modes and language models.

Instructions

Perform Retrieval-Augmented Generation (RAG) query with full parameter control.

This tool retrieves relevant context from the knowledge base and generates an answer using a language model. Supports all search modes (semantic, hybrid, graph) and customizable generation parameters.

Args: query: The question to answer using the knowledge base. Required. preset: Preset configuration for common use cases. Options: - "default": Basic RAG with gpt-4o-mini, temperature 0.7, 10 results - "development": Hybrid search with higher temperature for creative answers, 15 results - "refactoring": Hybrid + graph search with gpt-4o for code analysis, 20 results - "debug": Minimal graph search with low temperature for precise answers, 5 results - "research": Comprehensive search with gpt-4o for research questions, 30 results - "production": Balanced hybrid search optimized for production, 10 results model: LLM model to use for generation. Examples: - "vertex_ai/gemini-2.5-flash" (default, fast and cost-effective) - "vertex_ai/gemini-2.5-pro" (more capable, higher cost) - "openai/gpt-4-turbo" (high performance) - "anthropic/claude-3-haiku-20240307" (fast) - "anthropic/claude-3-sonnet-20240229" (balanced) - "anthropic/claude-3-opus-20240229" (most capable) temperature: Generation temperature controlling randomness. Must be between 0.0 and 1.0. Lower values (0.0-0.3) = more deterministic, precise answers Medium values (0.4-0.7) = balanced creativity and accuracy (default: 0.7) Higher values (0.8-1.0) = more creative, diverse answers max_tokens: Maximum number of tokens to generate. Optional, uses model default if not specified. use_semantic_search: Enable semantic/vector search for retrieval (default: True) use_hybrid_search: Enable hybrid search combining semantic and full-text search (default: False) use_graph_search: Enable knowledge graph search for entity/relationship context (default: False) limit: Maximum number of search results to retrieve. Must be between 1 and 100 (default: 10) kg_search_type: Knowledge graph search type. "local" for local context, "global" for broader connections (default: "local") semantic_weight: Weight for semantic search in hybrid mode. Must be between 0.0 and 10.0 (default: 5.0) full_text_weight: Weight for full-text search in hybrid mode. Must be between 0.0 and 10.0 (default: 1.0) full_text_limit: Maximum full-text results to consider in hybrid search. Must be between 1 and 1000 (default: 200) rrf_k: Reciprocal Rank Fusion parameter for hybrid search. Must be between 1 and 100 (default: 50) search_strategy: Advanced search strategy (e.g., "hyde", "rag_fusion"). Optional. include_web_search: Include web search results from the internet (default: False) task_prompt_override: Custom system prompt to override the default RAG task prompt. Useful for specializing AI behavior for specific domains or tasks. Optional.

Returns: Generated answer based on relevant context from the knowledge base.

Examples: # Simple RAG query rag("What is machine learning?")

# Development preset for code questions rag("How to implement async/await in Python?", preset="development") # Custom RAG with specific model and temperature rag( "Explain neural networks", model="vertex_ai/gemini-2.5-pro", temperature=0.5 ) # Research preset with comprehensive search rag( "Latest developments in transformer architectures", preset="research" ) # Debug preset for precise technical answers rag("What causes this error?", preset="debug")

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
queryYes
presetNodefault
modelNovertex_ai/gemini-2.5-pro
temperatureNo
max_tokensNo
use_semantic_searchNo
use_hybrid_searchNo
use_graph_searchNo
limitNo
kg_search_typeNoglobal
semantic_weightNo
full_text_weightNo
full_text_limitNo
rrf_kNo
search_strategyNo
include_web_searchNo
task_prompt_overrideNo

Implementation Reference

  • The core handler function 'rag' decorated with @mcp.tool(), implementing Retrieval-Augmented Generation: retrieves relevant context from R2R knowledge base and generates LLM response.
    @mcp.tool( annotations={ "title": "R2R RAG", "readOnlyHint": False, "destructiveHint": False, "openWorldHint": True, } ) async def rag( query: str, ctx: Context, preset: str = "default", model: str = "vertex_ai/gemini-2.5-pro", temperature: float = 0.7, max_tokens: int | None = 8000, use_semantic_search: bool = True, use_hybrid_search: bool = False, use_graph_search: bool = True, limit: int = 100, kg_search_type: Literal["local", "global"] = "global", semantic_weight: float = 5.0, full_text_weight: float = 1.0, full_text_limit: int = 200, rrf_k: int = 50, search_strategy: str | None = None, include_web_search: bool = False, task_prompt_override: str | None = None, ) -> str: """ Perform Retrieval-Augmented Generation (RAG) query with full parameter control. This tool retrieves relevant context from the knowledge base and generates an answer using a language model. Supports all search modes (semantic, hybrid, graph) and customizable generation parameters. Args: query: The question to answer using the knowledge base. Required. preset: Preset configuration for common use cases. Options: - "default": Basic RAG with gpt-4o-mini, temperature 0.7, 10 results - "development": Hybrid search with higher temperature for creative answers, 15 results - "refactoring": Hybrid + graph search with gpt-4o for code analysis, 20 results - "debug": Minimal graph search with low temperature for precise answers, 5 results - "research": Comprehensive search with gpt-4o for research questions, 30 results - "production": Balanced hybrid search optimized for production, 10 results model: LLM model to use for generation. Examples: - "vertex_ai/gemini-2.5-flash" (default, fast and cost-effective) - "vertex_ai/gemini-2.5-pro" (more capable, higher cost) - "openai/gpt-4-turbo" (high performance) - "anthropic/claude-3-haiku-20240307" (fast) - "anthropic/claude-3-sonnet-20240229" (balanced) - "anthropic/claude-3-opus-20240229" (most capable) temperature: Generation temperature controlling randomness. Must be between 0.0 and 1.0. Lower values (0.0-0.3) = more deterministic, precise answers Medium values (0.4-0.7) = balanced creativity and accuracy (default: 0.7) Higher values (0.8-1.0) = more creative, diverse answers max_tokens: Maximum number of tokens to generate. Optional, uses model default if not specified. use_semantic_search: Enable semantic/vector search for retrieval (default: True) use_hybrid_search: Enable hybrid search combining semantic and full-text search (default: False) use_graph_search: Enable knowledge graph search for entity/relationship context (default: False) limit: Maximum number of search results to retrieve. Must be between 1 and 100 (default: 10) kg_search_type: Knowledge graph search type. "local" for local context, "global" for broader connections (default: "local") semantic_weight: Weight for semantic search in hybrid mode. Must be between 0.0 and 10.0 (default: 5.0) full_text_weight: Weight for full-text search in hybrid mode. Must be between 0.0 and 10.0 (default: 1.0) full_text_limit: Maximum full-text results to consider in hybrid search. Must be between 1 and 1000 (default: 200) rrf_k: Reciprocal Rank Fusion parameter for hybrid search. Must be between 1 and 100 (default: 50) search_strategy: Advanced search strategy (e.g., "hyde", "rag_fusion"). Optional. include_web_search: Include web search results from the internet (default: False) task_prompt_override: Custom system prompt to override the default RAG task prompt. Useful for specializing AI behavior for specific domains or tasks. Optional. Returns: Generated answer based on relevant context from the knowledge base. Examples: # Simple RAG query rag("What is machine learning?") # Development preset for code questions rag("How to implement async/await in Python?", preset="development") # Custom RAG with specific model and temperature rag( "Explain neural networks", model="vertex_ai/gemini-2.5-pro", temperature=0.5 ) # Research preset with comprehensive search rag( "Latest developments in transformer architectures", preset="research" ) # Debug preset for precise technical answers rag("What causes this error?", preset="debug") """ await ctx.info(f"RAG query: {query}, preset: {preset}, model: {model}") try: # Validate parameters validate_limit(limit) validate_temperature(temperature) validate_semantic_weight(semantic_weight) validate_full_text_weight(full_text_weight) validate_full_text_limit(full_text_limit) validate_rrf_k(rrf_k) if use_graph_search: validate_kg_search_type(kg_search_type) await ctx.report_progress(progress=10, total=100, message="Initializing RAG") client = R2RClient(base_url=R2R_BASE_URL) if API_KEY: client.set_api_key(API_KEY) # Get preset configuration and merge with explicit parameters preset_config = get_rag_preset_config(preset) # Apply preset values, but allow explicit parameters to override final_use_hybrid = ( use_hybrid_search if preset == "default" else ( use_hybrid_search or preset_config["search_settings"].get("use_hybrid_search", False) ) ) final_use_graph = ( use_graph_search if preset == "default" else ( use_graph_search or preset_config["search_settings"].get("use_graph_search", False) ) ) search_settings: dict[str, Any] = { "use_semantic_search": use_semantic_search, "limit": limit, } # Apply hybrid search settings if final_use_hybrid: search_settings["use_hybrid_search"] = True hybrid_config = preset_config["search_settings"].get("hybrid_settings", {}) search_settings["hybrid_settings"] = { "semantic_weight": semantic_weight if semantic_weight != 5.0 or preset == "default" else hybrid_config.get("semantic_weight", 5.0), "full_text_weight": full_text_weight if full_text_weight != 1.0 or preset == "default" else hybrid_config.get("full_text_weight", 1.0), "full_text_limit": full_text_limit if full_text_limit != 200 or preset == "default" else hybrid_config.get("full_text_limit", 200), "rrf_k": rrf_k if rrf_k != 50 or preset == "default" else hybrid_config.get("rrf_k", 50), } await ctx.info("Hybrid search enabled for RAG") # Apply graph search settings if final_use_graph: kg_type = ( kg_search_type if kg_search_type != "local" or preset == "default" else preset_config["search_settings"].get("kg_search_type", "local") ) search_settings["graph_search_settings"] = { "use_graph_search": True, "kg_search_type": kg_type, } await ctx.info(f"Knowledge graph search enabled (type: {kg_type})") # Apply search strategy if provided if search_strategy: search_settings["search_strategy"] = search_strategy await ctx.info(f"Search strategy: {search_strategy}") # Build RAG generation config rag_model = ( model if model != "vertex_ai/gemini-2.5-flash" else preset_config["rag_generation_config"].get( "model", "vertex_ai/gemini-2.5-flash" ) ) rag_temp = ( temperature if temperature != 0.7 else preset_config["rag_generation_config"].get("temperature", 0.7) ) rag_generation_config: dict[str, Any] = { "model": rag_model, "temperature": rag_temp, "stream": False, } if max_tokens is not None: rag_generation_config["max_tokens"] = max_tokens await ctx.report_progress(progress=30, total=100, message="Retrieving context") try: rag_kwargs: dict[str, Any] = { "query": query, "search_settings": search_settings if search_settings else None, "rag_generation_config": rag_generation_config, "include_web_search": include_web_search, } if task_prompt_override: rag_kwargs["task_prompt"] = task_prompt_override rag_response = client.retrieval.rag(**rag_kwargs) await ctx.report_progress( progress=90, total=100, message="Generating answer" ) answer = rag_response.results.generated_answer # type: ignore await ctx.report_progress(progress=100, total=100, message="Complete") await ctx.info("RAG completed successfully") return answer except Exception as e: await ctx.error(f"RAG generation failed: {e!s}") raise except ValueError as e: await ctx.error(f"Validation error: {e!s}") raise except Exception as e: await ctx.error(f"RAG failed: {e!s}") raise
  • server.py:639-646 (registration)
    FastMCP tool registration for the 'rag' tool using @mcp.tool() decorator with title 'R2R RAG'.
    @mcp.tool( annotations={ "title": "R2R RAG", "readOnlyHint": False, "destructiveHint": False, "openWorldHint": True, } )
  • Input schema defined by function parameters with type annotations, default values, and comprehensive docstring specifying validation rules and examples.
    async def rag( query: str, ctx: Context, preset: str = "default", model: str = "vertex_ai/gemini-2.5-pro", temperature: float = 0.7, max_tokens: int | None = 8000, use_semantic_search: bool = True, use_hybrid_search: bool = False, use_graph_search: bool = True, limit: int = 100, kg_search_type: Literal["local", "global"] = "global", semantic_weight: float = 5.0, full_text_weight: float = 1.0, full_text_limit: int = 200, rrf_k: int = 50, search_strategy: str | None = None, include_web_search: bool = False, task_prompt_override: str | None = None, ) -> str: """ Perform Retrieval-Augmented Generation (RAG) query with full parameter control. This tool retrieves relevant context from the knowledge base and generates an answer using a language model. Supports all search modes (semantic, hybrid, graph) and customizable generation parameters. Args: query: The question to answer using the knowledge base. Required. preset: Preset configuration for common use cases. Options: - "default": Basic RAG with gpt-4o-mini, temperature 0.7, 10 results - "development": Hybrid search with higher temperature for creative answers, 15 results - "refactoring": Hybrid + graph search with gpt-4o for code analysis, 20 results - "debug": Minimal graph search with low temperature for precise answers, 5 results - "research": Comprehensive search with gpt-4o for research questions, 30 results - "production": Balanced hybrid search optimized for production, 10 results model: LLM model to use for generation. Examples: - "vertex_ai/gemini-2.5-flash" (default, fast and cost-effective) - "vertex_ai/gemini-2.5-pro" (more capable, higher cost) - "openai/gpt-4-turbo" (high performance) - "anthropic/claude-3-haiku-20240307" (fast) - "anthropic/claude-3-sonnet-20240229" (balanced) - "anthropic/claude-3-opus-20240229" (most capable) temperature: Generation temperature controlling randomness. Must be between 0.0 and 1.0. Lower values (0.0-0.3) = more deterministic, precise answers Medium values (0.4-0.7) = balanced creativity and accuracy (default: 0.7) Higher values (0.8-1.0) = more creative, diverse answers max_tokens: Maximum number of tokens to generate. Optional, uses model default if not specified. use_semantic_search: Enable semantic/vector search for retrieval (default: True) use_hybrid_search: Enable hybrid search combining semantic and full-text search (default: False) use_graph_search: Enable knowledge graph search for entity/relationship context (default: False) limit: Maximum number of search results to retrieve. Must be between 1 and 100 (default: 10) kg_search_type: Knowledge graph search type. "local" for local context, "global" for broader connections (default: "local") semantic_weight: Weight for semantic search in hybrid mode. Must be between 0.0 and 10.0 (default: 5.0) full_text_weight: Weight for full-text search in hybrid mode. Must be between 0.0 and 10.0 (default: 1.0) full_text_limit: Maximum full-text results to consider in hybrid search. Must be between 1 and 1000 (default: 200) rrf_k: Reciprocal Rank Fusion parameter for hybrid search. Must be between 1 and 100 (default: 50) search_strategy: Advanced search strategy (e.g., "hyde", "rag_fusion"). Optional. include_web_search: Include web search results from the internet (default: False) task_prompt_override: Custom system prompt to override the default RAG task prompt. Useful for specializing AI behavior for specific domains or tasks. Optional. Returns: Generated answer based on relevant context from the knowledge base. Examples: # Simple RAG query rag("What is machine learning?") # Development preset for code questions rag("How to implement async/await in Python?", preset="development") # Custom RAG with specific model and temperature rag( "Explain neural networks", model="vertex_ai/gemini-2.5-pro", temperature=0.5 ) # Research preset with comprehensive search rag( "Latest developments in transformer architectures", preset="research" ) # Debug preset for precise technical answers rag("What causes this error?", preset="debug") """
  • Enum defining RAG presets used by the rag tool for common configurations.
    class RAGPreset(str, Enum): """Preset configurations for RAG operations.""" DEFAULT = "default" DEVELOPMENT = "development" REFACTORING = "refactoring" DEBUG = "debug" RESEARCH = "research" PRODUCTION = "production"
  • Helper function to retrieve RAG preset configurations used in the rag tool for search and generation settings.
    def get_rag_preset_config(preset: str) -> dict[str, Any]: """ Get RAG configuration for a preset. Args: preset: Preset name (default, development, refactoring, debug, research, production) Returns: Dictionary with RAG settings (search_settings and rag_generation_config) """ presets = { "default": { "search_settings": { "use_semantic_search": True, "use_hybrid_search": False, "limit": 10, }, "rag_generation_config": { "model": "vertex_ai/gemini-2.5-flash", "temperature": 0.7, }, }, "development": { "search_settings": { "use_semantic_search": True, "use_hybrid_search": True, "limit": 15, "hybrid_settings": { "semantic_weight": 5.0, "full_text_weight": 1.0, "full_text_limit": 200, "rrf_k": 50, }, }, "rag_generation_config": { "model": "vertex_ai/gemini-2.5-flash", "temperature": 0.8, }, }, "refactoring": { "search_settings": { "use_semantic_search": True, "use_hybrid_search": True, "use_graph_search": True, "limit": 20, "kg_search_type": "local", "hybrid_settings": { "semantic_weight": 7.0, "full_text_weight": 3.0, "full_text_limit": 300, "rrf_k": 50, }, }, "rag_generation_config": { "model": "vertex_ai/gemini-2.5-pro", "temperature": 0.5, }, }, "debug": { "search_settings": { "use_semantic_search": True, "use_hybrid_search": False, "use_graph_search": True, "limit": 5, "kg_search_type": "local", }, "rag_generation_config": { "model": "vertex_ai/gemini-2.5-flash", "temperature": 0.3, }, }, "research": { "search_settings": { "use_semantic_search": True, "use_hybrid_search": True, "use_graph_search": True, "limit": 30, "kg_search_type": "global", "hybrid_settings": { "semantic_weight": 6.0, "full_text_weight": 2.0, "full_text_limit": 400, "rrf_k": 60, }, }, "rag_generation_config": { "model": "vertex_ai/gemini-2.5-pro", "temperature": 0.7, }, }, "production": { "search_settings": { "use_semantic_search": True, "use_hybrid_search": True, "limit": 10, "hybrid_settings": { "semantic_weight": 5.0, "full_text_weight": 1.0, "full_text_limit": 200, "rrf_k": 50, }, }, "rag_generation_config": { "model": "vertex_ai/gemini-2.5-flash", "temperature": 0.6, }, }, } config = presets.get(preset.lower(), presets["default"]) return { "search_settings": config["search_settings"].copy(), "rag_generation_config": config["rag_generation_config"].copy(), }
Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/evgenygurin/r2r-rag-search-agent'

If you have feedback or need assistance with the MCP directory API, please join our Discord server