rag

Answer questions by retrieving relevant information from knowledge bases and generating responses with customizable search modes and language models.

Instructions

Perform Retrieval-Augmented Generation (RAG) query with full parameter control.

This tool retrieves relevant context from the knowledge base and generates an answer using a language model. Supports all search modes (semantic, hybrid, graph) and customizable generation parameters.

Args: query: The question to answer using the knowledge base. Required. preset: Preset configuration for common use cases. Options: - "default": Basic RAG with gpt-4o-mini, temperature 0.7, 10 results - "development": Hybrid search with higher temperature for creative answers, 15 results - "refactoring": Hybrid + graph search with gpt-4o for code analysis, 20 results - "debug": Minimal graph search with low temperature for precise answers, 5 results - "research": Comprehensive search with gpt-4o for research questions, 30 results - "production": Balanced hybrid search optimized for production, 10 results model: LLM model to use for generation. Examples: - "vertex_ai/gemini-2.5-flash" (default, fast and cost-effective) - "vertex_ai/gemini-2.5-pro" (more capable, higher cost) - "openai/gpt-4-turbo" (high performance) - "anthropic/claude-3-haiku-20240307" (fast) - "anthropic/claude-3-sonnet-20240229" (balanced) - "anthropic/claude-3-opus-20240229" (most capable) temperature: Generation temperature controlling randomness. Must be between 0.0 and 1.0. Lower values (0.0-0.3) = more deterministic, precise answers Medium values (0.4-0.7) = balanced creativity and accuracy (default: 0.7) Higher values (0.8-1.0) = more creative, diverse answers max_tokens: Maximum number of tokens to generate. Optional, uses model default if not specified. use_semantic_search: Enable semantic/vector search for retrieval (default: True) use_hybrid_search: Enable hybrid search combining semantic and full-text search (default: False) use_graph_search: Enable knowledge graph search for entity/relationship context (default: False) limit: Maximum number of search results to retrieve. Must be between 1 and 100 (default: 10) kg_search_type: Knowledge graph search type. "local" for local context, "global" for broader connections (default: "local") semantic_weight: Weight for semantic search in hybrid mode. Must be between 0.0 and 10.0 (default: 5.0) full_text_weight: Weight for full-text search in hybrid mode. Must be between 0.0 and 10.0 (default: 1.0) full_text_limit: Maximum full-text results to consider in hybrid search. Must be between 1 and 1000 (default: 200) rrf_k: Reciprocal Rank Fusion parameter for hybrid search. Must be between 1 and 100 (default: 50) search_strategy: Advanced search strategy (e.g., "hyde", "rag_fusion"). Optional. include_web_search: Include web search results from the internet (default: False) task_prompt_override: Custom system prompt to override the default RAG task prompt. Useful for specializing AI behavior for specific domains or tasks. Optional.

Returns: Generated answer based on relevant context from the knowledge base.

Examples: # Simple RAG query rag("What is machine learning?")

# Development preset for code questions
rag("How to implement async/await in Python?", preset="development")

# Custom RAG with specific model and temperature
rag(
    "Explain neural networks",
    model="vertex_ai/gemini-2.5-pro",
    temperature=0.5
)

# Research preset with comprehensive search
rag(
    "Latest developments in transformer architectures",
    preset="research"
)

# Debug preset for precise technical answers
rag("What causes this error?", preset="debug")

Input Schema

TableJSON Schema

Name	Required	Default
`query`	Yes
`preset`	No	default
`model`	No	vertex_ai/gemini-2.5-pro
`temperature`	No
`max_tokens`	No
`use_semantic_search`	No
`use_hybrid_search`	No
`use_graph_search`	No
`limit`	No
`kg_search_type`	No	global
`semantic_weight`	No
`full_text_weight`	No
`full_text_limit`	No
`rrf_k`	No
`search_strategy`	No
`include_web_search`	No
`task_prompt_override`	No

Implementation Reference

server.py:639-894 (handler)

The core handler function 'rag' decorated with @mcp.tool(), implementing Retrieval-Augmented Generation: retrieves relevant context from R2R knowledge base and generates LLM response.

@mcp.tool(
    annotations={
        "title": "R2R RAG",
        "readOnlyHint": False,
        "destructiveHint": False,
        "openWorldHint": True,
    }
)
async def rag(
    query: str,
    ctx: Context,
    preset: str = "default",
    model: str = "vertex_ai/gemini-2.5-pro",
    temperature: float = 0.7,
    max_tokens: int | None = 8000,
    use_semantic_search: bool = True,
    use_hybrid_search: bool = False,
    use_graph_search: bool = True,
    limit: int = 100,
    kg_search_type: Literal["local", "global"] = "global",
    semantic_weight: float = 5.0,
    full_text_weight: float = 1.0,
    full_text_limit: int = 200,
    rrf_k: int = 50,
    search_strategy: str | None = None,
    include_web_search: bool = False,
    task_prompt_override: str | None = None,
) -> str:
    """
    Perform Retrieval-Augmented Generation (RAG) query with full parameter control.

    This tool retrieves relevant context from the knowledge base and generates an answer
    using a language model. Supports all search modes (semantic, hybrid, graph) and
    customizable generation parameters.

    Args:
        query: The question to answer using the knowledge base. Required.
        preset: Preset configuration for common use cases. Options:
            - "default": Basic RAG with gpt-4o-mini, temperature 0.7,
                10 results
            - "development": Hybrid search with higher temperature for
                creative answers, 15 results
            - "refactoring": Hybrid + graph search with gpt-4o for code
                analysis, 20 results
            - "debug": Minimal graph search with low temperature for
                precise answers, 5 results
            - "research": Comprehensive search with gpt-4o for research
                questions, 30 results
            - "production": Balanced hybrid search optimized for
                production, 10 results
        model: LLM model to use for generation. Examples:
            - "vertex_ai/gemini-2.5-flash" (default, fast and
                cost-effective)
            - "vertex_ai/gemini-2.5-pro" (more capable, higher cost)
            - "openai/gpt-4-turbo" (high performance)
            - "anthropic/claude-3-haiku-20240307" (fast)
            - "anthropic/claude-3-sonnet-20240229" (balanced)
            - "anthropic/claude-3-opus-20240229" (most capable)
        temperature: Generation temperature controlling randomness. Must be
            between 0.0 and 1.0.
            Lower values (0.0-0.3) = more deterministic, precise answers
            Medium values (0.4-0.7) = balanced creativity and accuracy
                (default: 0.7)
            Higher values (0.8-1.0) = more creative, diverse answers
        max_tokens: Maximum number of tokens to generate. Optional, uses
            model default if not specified.
        use_semantic_search: Enable semantic/vector search for retrieval
            (default: True)
        use_hybrid_search: Enable hybrid search combining semantic and
            full-text search (default: False)
        use_graph_search: Enable knowledge graph search for
            entity/relationship context (default: False)
        limit: Maximum number of search results to retrieve. Must be
            between 1 and 100 (default: 10)
        kg_search_type: Knowledge graph search type. "local" for local
            context, "global" for broader connections (default: "local")
        semantic_weight: Weight for semantic search in hybrid mode. Must be
            between 0.0 and 10.0 (default: 5.0)
        full_text_weight: Weight for full-text search in hybrid mode. Must
            be between 0.0 and 10.0 (default: 1.0)
        full_text_limit: Maximum full-text results to consider in hybrid
            search. Must be between 1 and 1000 (default: 200)
        rrf_k: Reciprocal Rank Fusion parameter for hybrid search. Must be
            between 1 and 100 (default: 50)
        search_strategy: Advanced search strategy (e.g., "hyde",
            "rag_fusion"). Optional.
        include_web_search: Include web search results from the internet
            (default: False)
        task_prompt_override: Custom system prompt to override the default
            RAG task prompt. Useful for specializing AI behavior for specific
            domains or tasks. Optional.

    Returns:
        Generated answer based on relevant context from the knowledge base.

    Examples:
        # Simple RAG query
        rag("What is machine learning?")

        # Development preset for code questions
        rag("How to implement async/await in Python?", preset="development")

        # Custom RAG with specific model and temperature
        rag(
            "Explain neural networks",
            model="vertex_ai/gemini-2.5-pro",
            temperature=0.5
        )

        # Research preset with comprehensive search
        rag(
            "Latest developments in transformer architectures",
            preset="research"
        )

        # Debug preset for precise technical answers
        rag("What causes this error?", preset="debug")
    """
    await ctx.info(f"RAG query: {query}, preset: {preset}, model: {model}")

    try:
        # Validate parameters
        validate_limit(limit)
        validate_temperature(temperature)
        validate_semantic_weight(semantic_weight)
        validate_full_text_weight(full_text_weight)
        validate_full_text_limit(full_text_limit)
        validate_rrf_k(rrf_k)
        if use_graph_search:
            validate_kg_search_type(kg_search_type)

        await ctx.report_progress(progress=10, total=100, message="Initializing RAG")

        client = R2RClient(base_url=R2R_BASE_URL)
        if API_KEY:
            client.set_api_key(API_KEY)

        # Get preset configuration and merge with explicit parameters
        preset_config = get_rag_preset_config(preset)

        # Apply preset values, but allow explicit parameters to override
        final_use_hybrid = (
            use_hybrid_search
            if preset == "default"
            else (
                use_hybrid_search
                or preset_config["search_settings"].get("use_hybrid_search", False)
            )
        )
        final_use_graph = (
            use_graph_search
            if preset == "default"
            else (
                use_graph_search
                or preset_config["search_settings"].get("use_graph_search", False)
            )
        )

        search_settings: dict[str, Any] = {
            "use_semantic_search": use_semantic_search,
            "limit": limit,
        }

        # Apply hybrid search settings
        if final_use_hybrid:
            search_settings["use_hybrid_search"] = True
            hybrid_config = preset_config["search_settings"].get("hybrid_settings", {})
            search_settings["hybrid_settings"] = {
                "semantic_weight": semantic_weight
                if semantic_weight != 5.0 or preset == "default"
                else hybrid_config.get("semantic_weight", 5.0),
                "full_text_weight": full_text_weight
                if full_text_weight != 1.0 or preset == "default"
                else hybrid_config.get("full_text_weight", 1.0),
                "full_text_limit": full_text_limit
                if full_text_limit != 200 or preset == "default"
                else hybrid_config.get("full_text_limit", 200),
                "rrf_k": rrf_k
                if rrf_k != 50 or preset == "default"
                else hybrid_config.get("rrf_k", 50),
            }
            await ctx.info("Hybrid search enabled for RAG")

        # Apply graph search settings
        if final_use_graph:
            kg_type = (
                kg_search_type
                if kg_search_type != "local" or preset == "default"
                else preset_config["search_settings"].get("kg_search_type", "local")
            )
            search_settings["graph_search_settings"] = {
                "use_graph_search": True,
                "kg_search_type": kg_type,
            }
            await ctx.info(f"Knowledge graph search enabled (type: {kg_type})")

        # Apply search strategy if provided
        if search_strategy:
            search_settings["search_strategy"] = search_strategy
            await ctx.info(f"Search strategy: {search_strategy}")

        # Build RAG generation config
        rag_model = (
            model
            if model != "vertex_ai/gemini-2.5-flash"
            else preset_config["rag_generation_config"].get(
                "model", "vertex_ai/gemini-2.5-flash"
            )
        )
        rag_temp = (
            temperature
            if temperature != 0.7
            else preset_config["rag_generation_config"].get("temperature", 0.7)
        )
        rag_generation_config: dict[str, Any] = {
            "model": rag_model,
            "temperature": rag_temp,
            "stream": False,
        }
        if max_tokens is not None:
            rag_generation_config["max_tokens"] = max_tokens

        await ctx.report_progress(progress=30, total=100, message="Retrieving context")

        try:
            rag_kwargs: dict[str, Any] = {
                "query": query,
                "search_settings": search_settings if search_settings else None,
                "rag_generation_config": rag_generation_config,
                "include_web_search": include_web_search,
            }
            if task_prompt_override:
                rag_kwargs["task_prompt"] = task_prompt_override

            rag_response = client.retrieval.rag(**rag_kwargs)

            await ctx.report_progress(
                progress=90, total=100, message="Generating answer"
            )

            answer = rag_response.results.generated_answer  # type: ignore

            await ctx.report_progress(progress=100, total=100, message="Complete")
            await ctx.info("RAG completed successfully")

            return answer
        except Exception as e:
            await ctx.error(f"RAG generation failed: {e!s}")
            raise
    except ValueError as e:
        await ctx.error(f"Validation error: {e!s}")
        raise
    except Exception as e:
        await ctx.error(f"RAG failed: {e!s}")
        raise

server.py:639-646 (registration)

FastMCP tool registration for the 'rag' tool using @mcp.tool() decorator with title 'R2R RAG'.

@mcp.tool(
    annotations={
        "title": "R2R RAG",
        "readOnlyHint": False,
        "destructiveHint": False,
        "openWorldHint": True,
    }
)

server.py:647-756 (schema)

Input schema defined by function parameters with type annotations, default values, and comprehensive docstring specifying validation rules and examples.

async def rag(
    query: str,
    ctx: Context,
    preset: str = "default",
    model: str = "vertex_ai/gemini-2.5-pro",
    temperature: float = 0.7,
    max_tokens: int | None = 8000,
    use_semantic_search: bool = True,
    use_hybrid_search: bool = False,
    use_graph_search: bool = True,
    limit: int = 100,
    kg_search_type: Literal["local", "global"] = "global",
    semantic_weight: float = 5.0,
    full_text_weight: float = 1.0,
    full_text_limit: int = 200,
    rrf_k: int = 50,
    search_strategy: str | None = None,
    include_web_search: bool = False,
    task_prompt_override: str | None = None,
) -> str:
    """
    Perform Retrieval-Augmented Generation (RAG) query with full parameter control.

    This tool retrieves relevant context from the knowledge base and generates an answer
    using a language model. Supports all search modes (semantic, hybrid, graph) and
    customizable generation parameters.

    Args:
        query: The question to answer using the knowledge base. Required.
        preset: Preset configuration for common use cases. Options:
            - "default": Basic RAG with gpt-4o-mini, temperature 0.7,
                10 results
            - "development": Hybrid search with higher temperature for
                creative answers, 15 results
            - "refactoring": Hybrid + graph search with gpt-4o for code
                analysis, 20 results
            - "debug": Minimal graph search with low temperature for
                precise answers, 5 results
            - "research": Comprehensive search with gpt-4o for research
                questions, 30 results
            - "production": Balanced hybrid search optimized for
                production, 10 results
        model: LLM model to use for generation. Examples:
            - "vertex_ai/gemini-2.5-flash" (default, fast and
                cost-effective)
            - "vertex_ai/gemini-2.5-pro" (more capable, higher cost)
            - "openai/gpt-4-turbo" (high performance)
            - "anthropic/claude-3-haiku-20240307" (fast)
            - "anthropic/claude-3-sonnet-20240229" (balanced)
            - "anthropic/claude-3-opus-20240229" (most capable)
        temperature: Generation temperature controlling randomness. Must be
            between 0.0 and 1.0.
            Lower values (0.0-0.3) = more deterministic, precise answers
            Medium values (0.4-0.7) = balanced creativity and accuracy
                (default: 0.7)
            Higher values (0.8-1.0) = more creative, diverse answers
        max_tokens: Maximum number of tokens to generate. Optional, uses
            model default if not specified.
        use_semantic_search: Enable semantic/vector search for retrieval
            (default: True)
        use_hybrid_search: Enable hybrid search combining semantic and
            full-text search (default: False)
        use_graph_search: Enable knowledge graph search for
            entity/relationship context (default: False)
        limit: Maximum number of search results to retrieve. Must be
            between 1 and 100 (default: 10)
        kg_search_type: Knowledge graph search type. "local" for local
            context, "global" for broader connections (default: "local")
        semantic_weight: Weight for semantic search in hybrid mode. Must be
            between 0.0 and 10.0 (default: 5.0)
        full_text_weight: Weight for full-text search in hybrid mode. Must
            be between 0.0 and 10.0 (default: 1.0)
        full_text_limit: Maximum full-text results to consider in hybrid
            search. Must be between 1 and 1000 (default: 200)
        rrf_k: Reciprocal Rank Fusion parameter for hybrid search. Must be
            between 1 and 100 (default: 50)
        search_strategy: Advanced search strategy (e.g., "hyde",
            "rag_fusion"). Optional.
        include_web_search: Include web search results from the internet
            (default: False)
        task_prompt_override: Custom system prompt to override the default
            RAG task prompt. Useful for specializing AI behavior for specific
            domains or tasks. Optional.

    Returns:
        Generated answer based on relevant context from the knowledge base.

    Examples:
        # Simple RAG query
        rag("What is machine learning?")

        # Development preset for code questions
        rag("How to implement async/await in Python?", preset="development")

        # Custom RAG with specific model and temperature
        rag(
            "Explain neural networks",
            model="vertex_ai/gemini-2.5-pro",
            temperature=0.5
        )

        # Research preset with comprehensive search
        rag(
            "Latest developments in transformer architectures",
            preset="research"
        )

        # Debug preset for precise technical answers
        rag("What causes this error?", preset="debug")
    """

server.py:113-121 (helper)

Enum defining RAG presets used by the rag tool for common configurations.

class RAGPreset(str, Enum):
    """Preset configurations for RAG operations."""

    DEFAULT = "default"
    DEVELOPMENT = "development"
    REFACTORING = "refactoring"
    DEBUG = "debug"
    RESEARCH = "research"
    PRODUCTION = "production"

server.py:203-316 (helper)

Helper function to retrieve RAG preset configurations used in the rag tool for search and generation settings.

def get_rag_preset_config(preset: str) -> dict[str, Any]:
    """
    Get RAG configuration for a preset.

    Args:
        preset: Preset name (default, development, refactoring, debug,
                research, production)

    Returns:
        Dictionary with RAG settings (search_settings and rag_generation_config)
    """
    presets = {
        "default": {
            "search_settings": {
                "use_semantic_search": True,
                "use_hybrid_search": False,
                "limit": 10,
            },
            "rag_generation_config": {
                "model": "vertex_ai/gemini-2.5-flash",
                "temperature": 0.7,
            },
        },
        "development": {
            "search_settings": {
                "use_semantic_search": True,
                "use_hybrid_search": True,
                "limit": 15,
                "hybrid_settings": {
                    "semantic_weight": 5.0,
                    "full_text_weight": 1.0,
                    "full_text_limit": 200,
                    "rrf_k": 50,
                },
            },
            "rag_generation_config": {
                "model": "vertex_ai/gemini-2.5-flash",
                "temperature": 0.8,
            },
        },
        "refactoring": {
            "search_settings": {
                "use_semantic_search": True,
                "use_hybrid_search": True,
                "use_graph_search": True,
                "limit": 20,
                "kg_search_type": "local",
                "hybrid_settings": {
                    "semantic_weight": 7.0,
                    "full_text_weight": 3.0,
                    "full_text_limit": 300,
                    "rrf_k": 50,
                },
            },
            "rag_generation_config": {
                "model": "vertex_ai/gemini-2.5-pro",
                "temperature": 0.5,
            },
        },
        "debug": {
            "search_settings": {
                "use_semantic_search": True,
                "use_hybrid_search": False,
                "use_graph_search": True,
                "limit": 5,
                "kg_search_type": "local",
            },
            "rag_generation_config": {
                "model": "vertex_ai/gemini-2.5-flash",
                "temperature": 0.3,
            },
        },
        "research": {
            "search_settings": {
                "use_semantic_search": True,
                "use_hybrid_search": True,
                "use_graph_search": True,
                "limit": 30,
                "kg_search_type": "global",
                "hybrid_settings": {
                    "semantic_weight": 6.0,
                    "full_text_weight": 2.0,
                    "full_text_limit": 400,
                    "rrf_k": 60,
                },
            },
            "rag_generation_config": {
                "model": "vertex_ai/gemini-2.5-pro",
                "temperature": 0.7,
            },
        },
        "production": {
            "search_settings": {
                "use_semantic_search": True,
                "use_hybrid_search": True,
                "limit": 10,
                "hybrid_settings": {
                    "semantic_weight": 5.0,
                    "full_text_weight": 1.0,
                    "full_text_limit": 200,
                    "rrf_k": 50,
                },
            },
            "rag_generation_config": {
                "model": "vertex_ai/gemini-2.5-flash",
                "temperature": 0.6,
            },
        },
    }
    config = presets.get(preset.lower(), presets["default"])
    return {
        "search_settings": config["search_settings"].copy(),
        "rag_generation_config": config["rag_generation_config"].copy(),
    }

R2R FastMCP Server

rag

Instructions

Input Schema

Implementation Reference

Other Tools

Latest Blog Posts

MCP directory API