Skip to main content
Glama
NiclasOlofsson

DBT Core MCP Server

get_lineage

Analyze data lineage and dependencies for DBT resources to understand upstream sources and downstream impacts with configurable depth and direction.

Instructions

Get lineage (dependency tree) for any dbt resource with auto-detection.

This unified tool works across all resource types (models, sources, seeds, snapshots, etc.) showing upstream and/or downstream dependencies with configurable depth.

Args: name: Resource name. For sources, use "source_name.table_name" or just "table_name" Examples: "customers", "jaffle_shop.orders", "raw_customers" resource_type: Optional filter to narrow search: - "model": Data transformation models - "source": External data sources - "seed": CSV reference data files - "snapshot": SCD Type 2 historical tables - "test": Data quality tests - "analysis": Ad-hoc analysis queries - None: Auto-detect (searches all types) direction: Lineage direction: - "upstream": Show where data comes from (parents) - "downstream": Show what depends on this resource (children) - "both": Show full lineage (default) depth: Maximum levels to traverse (None for unlimited) - depth=1: Immediate dependencies only - depth=2: Dependencies + their dependencies - None: Full dependency tree

Returns: Lineage information with upstream/downstream nodes and statistics. If multiple matches found, returns all matches for LLM to process.

Raises: ValueError: If resource not found or invalid direction

Examples: get_lineage("customers") -> auto-detect and show full lineage get_lineage("customers", "model", "upstream") -> where customers model gets data get_lineage("jaffle_shop.orders", "source", "downstream", 2) -> 2 levels of dependents

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
nameYes
resource_typeNo
directionNoboth
depthNo

Output Schema

TableJSON Schema
NameRequiredDescriptionDefault

No arguments

Implementation Reference

  • FastMCP tool registration for 'get_lineage', defines input parameters serving as schema and calls toolImpl_get_lineage after initialization.
    async def get_lineage(
        ctx: Context,
        name: str,
        resource_type: str | None = None,
        direction: str = "both",
        depth: int | None = None,
    ) -> dict[str, Any]:
        """Get lineage (dependency tree) for any dbt resource with auto-detection.
    
        This unified tool works across all resource types (models, sources, seeds, snapshots, etc.)
        showing upstream and/or downstream dependencies with configurable depth.
    
        Args:
            name: Resource name. For sources, use "source_name.table_name" or just "table_name"
                Examples: "customers", "jaffle_shop.orders", "raw_customers"
            resource_type: Optional filter to narrow search:
                - "model": Data transformation models
                - "source": External data sources
                - "seed": CSV reference data files
                - "snapshot": SCD Type 2 historical tables
                - "test": Data quality tests
                - "analysis": Ad-hoc analysis queries
                - None: Auto-detect (searches all types)
            direction: Lineage direction:
                - "upstream": Show where data comes from (parents)
                - "downstream": Show what depends on this resource (children)
                - "both": Show full lineage (default)
            depth: Maximum levels to traverse (None for unlimited)
                - depth=1: Immediate dependencies only
                - depth=2: Dependencies + their dependencies
                - None: Full dependency tree
    
        Returns:
            Lineage information with upstream/downstream nodes and statistics.
            If multiple matches found, returns all matches for LLM to process.
    
        Raises:
            ValueError: If resource not found or invalid direction
    
        Examples:
            get_lineage("customers") -> auto-detect and show full lineage
            get_lineage("customers", "model", "upstream") -> where customers model gets data
            get_lineage("jaffle_shop.orders", "source", "downstream", 2) -> 2 levels of dependents
        """
        await self._ensure_initialized_with_context(ctx)
        return await self.toolImpl_get_lineage(name, resource_type, direction, depth)
  • Primary handler method for the get_lineage tool, delegates core logic to ManifestLoader.get_lineage and handles errors.
    async def toolImpl_get_lineage(self, name: str, resource_type: str | None = None, direction: str = "both", depth: int | None = None) -> dict[str, Any]:
        """Implementation for get_lineage tool."""
        try:
            return self.manifest.get_lineage(name, resource_type, direction, depth)  # type: ignore
        except ValueError as e:
            raise ValueError(f"Lineage error: {e}")
  • Core lineage computation logic: finds resource by name/type, traverses upstream/downstream dependencies using dbt's parent_map/child_map with configurable depth, returns structured lineage with stats.
    def get_lineage(
        self,
        name: str,
        resource_type: str | None = None,
        direction: str = "both",
        depth: int | None = None,
    ) -> dict[str, Any]:
        """
        Get lineage (dependency tree) for any resource type with auto-detection.
    
        This unified method works across all resource types (models, sources, seeds, etc.)
        and provides upstream, downstream, or bidirectional dependency traversal.
    
        Args:
            name: Resource name. For sources, use "source_name.table_name" or just "table_name"
            resource_type: Optional filter (model, source, seed, snapshot, test, analysis).
                          If None, auto-detects resource type.
            direction: Lineage direction:
                - "upstream": Show where data comes from (parents)
                - "downstream": Show what depends on this resource (children)
                - "both": Show full lineage (default)
            depth: Maximum levels to traverse (None for unlimited)
                - depth=1: Immediate dependencies only
                - depth=2: Dependencies + their dependencies
                - None: Full dependency tree
    
        Returns:
            Dictionary with lineage information:
            {
                "resource": {...},  # The target resource info
                "upstream": [...],  # List of upstream dependencies (if direction in ["upstream", "both"])
                "downstream": [...],  # List of downstream dependents (if direction in ["downstream", "both"])
                "stats": {
                    "upstream_count": int,
                    "downstream_count": int,
                    "total_dependencies": int
                }
            }
    
            If multiple matches found, returns:
            {"multiple_matches": True, "matches": [...], "message": "..."}
    
        Raises:
            RuntimeError: If manifest not loaded
            ValueError: If resource not found or invalid direction
    
        Examples:
            get_lineage("customers") -> auto-detect and show full lineage
            get_lineage("customers", "model", "upstream") -> show where customers model gets data
            get_lineage("customers", direction="downstream", depth=2) -> 2 levels of dependents
        """
        if not self._manifest:
            raise RuntimeError("Manifest not loaded. Call load() first.")
    
        # Validate direction
        valid_directions = {"upstream", "downstream", "both"}
        if direction not in valid_directions:
            raise ValueError(f"Invalid direction '{direction}'. Must be one of: {', '.join(sorted(valid_directions))}")
    
        # Get the resource (auto-detect if resource_type not specified)
        resource = self.get_resource_node(name, resource_type)
    
        # Handle multiple matches - return for LLM to process
        if resource.get("multiple_matches"):
            return resource
    
        # Extract unique_id for lineage traversal
        unique_id = resource.get("unique_id")
        if not unique_id:
            raise ValueError(f"Resource '{name}' does not have a unique_id")
    
        # Build lineage based on direction
        result: dict[str, Any] = {
            "resource": {
                "name": resource.get("name"),
                "unique_id": unique_id,
                "resource_type": resource.get("resource_type"),
                "package_name": resource.get("package_name"),
            }
        }
    
        upstream: list[dict[str, Any]] = []
        downstream: list[dict[str, Any]] = []
    
        if direction in ("upstream", "both"):
            upstream = self.get_upstream_nodes(unique_id, max_depth=depth)
            result["upstream"] = upstream
    
        if direction in ("downstream", "both"):
            downstream = self.get_downstream_nodes(unique_id, max_depth=depth)
            result["downstream"] = downstream
    
        # Add statistics
        result["stats"] = {
            "upstream_count": len(upstream),
            "downstream_count": len(downstream),
            "total_dependencies": len(upstream) + len(downstream),
        }
    
        return result
  • Supporting utility for recursive upstream (parents) dependency traversal from a node's unique_id.
    def get_upstream_nodes(self, unique_id: str, max_depth: int | None = None, current_depth: int = 0) -> list[dict[str, Any]]:
        """Get all upstream dependencies of a node recursively.
    
        Args:
            unique_id: The unique identifier of the node
            max_depth: Maximum depth to traverse (None for unlimited)
            current_depth: Current recursion depth (internal use)
    
        Returns:
            List of dictionaries with upstream node info:
            {"unique_id": str, "name": str, "type": str, "distance": int}
        """
        if not self._manifest:
            raise RuntimeError("Manifest not loaded. Call load() first.")
    
        if max_depth is not None and current_depth >= max_depth:
            return []
    
        parent_map = self._manifest.get("parent_map", {})
        parents = parent_map.get(unique_id, [])
    
        upstream: list[dict[str, Any]] = []
        seen: set[str] = set()
    
        for parent_id in parents:
            if parent_id in seen:
                continue
            seen.add(parent_id)
    
            node = self.get_node_by_unique_id(parent_id)
            if node:
                resource_type = node.get("resource_type", "unknown")
                upstream.append(
                    {
                        "unique_id": parent_id,
                        "name": node.get("name", ""),
                        "type": resource_type,
                        "distance": current_depth + 1,
                    }
                )
    
                # Recurse
                if max_depth is None or current_depth + 1 < max_depth:
                    grandparents = self.get_upstream_nodes(parent_id, max_depth, current_depth + 1)
                    for gp in grandparents:
                        if gp["unique_id"] not in seen:
                            seen.add(str(gp["unique_id"]))
                            upstream.append(gp)
    
        return upstream
  • Supporting utility for recursive downstream (children) dependency traversal from a node's unique_id.
    def get_downstream_nodes(self, unique_id: str, max_depth: int | None = None, current_depth: int = 0) -> list[dict[str, Any]]:
        """Get all downstream dependents of a node recursively.
    
        Args:
            unique_id: The unique identifier of the node
            max_depth: Maximum depth to traverse (None for unlimited)
            current_depth: Current recursion depth (internal use)
    
        Returns:
            List of dictionaries with downstream node info:
            {"unique_id": str, "name": str, "type": str, "distance": int}
        """
        if not self._manifest:
            raise RuntimeError("Manifest not loaded. Call load() first.")
    
        if max_depth is not None and current_depth >= max_depth:
            return []
    
        child_map = self._manifest.get("child_map", {})
        children = child_map.get(unique_id, [])
    
        downstream: list[dict[str, Any]] = []
        seen: set[str] = set()
    
        for child_id in children:
            if child_id in seen:
                continue
            seen.add(child_id)
    
            node = self.get_node_by_unique_id(child_id)
            if node:
                resource_type = node.get("resource_type", "unknown")
                downstream.append(
                    {
                        "unique_id": child_id,
                        "name": node.get("name", ""),
                        "type": resource_type,
                        "distance": current_depth + 1,
                    }
                )
    
                # Recurse
                if max_depth is None or current_depth + 1 < max_depth:
                    grandchildren = self.get_downstream_nodes(child_id, max_depth, current_depth + 1)
                    for gc in grandchildren:
                        if gc["unique_id"] not in seen:
                            seen.add(str(gc["unique_id"]))
                            downstream.append(gc)
    
        return downstream
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It does this well by explaining the unified nature across resource types, auto-detection capability, configurable depth, and what happens with multiple matches ('returns all matches for LLM to process'). It also documents error conditions ('Raises: ValueError') and return format expectations. The only minor gap is not mentioning performance implications of unlimited depth.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured with clear sections (Args, Returns, Raises, Examples) and uses bullet points effectively. While comprehensive, it's appropriately sized for a tool with 4 parameters and complex behavior. The information is front-loaded with the core purpose first, though some redundancy exists in the examples section.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (dependency analysis across multiple resource types), no annotations, and the presence of an output schema, the description is remarkably complete. It covers purpose, parameters, behavior, error conditions, and examples. The output schema existence means the description doesn't need to detail return structure, allowing it to focus on operational context.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 0% schema description coverage, the description fully compensates by providing comprehensive parameter documentation. Each parameter (name, resource_type, direction, depth) gets detailed explanations with examples, format guidance, enum values with descriptions, and default behaviors. This adds substantial meaning beyond what the bare schema provides.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Get lineage (dependency tree) for any dbt resource with auto-detection.' It specifies the verb ('Get'), resource ('lineage/dependency tree'), and scope ('any dbt resource'), distinguishing it from siblings like get_resource_info or list_resources that don't focus on dependency relationships.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context for when to use this tool ('works across all resource types showing upstream and/or downstream dependencies') and includes examples demonstrating different scenarios. However, it doesn't explicitly state when NOT to use it or compare it to alternatives like analyze_impact, which might have overlapping functionality.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/NiclasOlofsson/dbt-core-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server