dbt-mcp

Official
by dbt-labs
Apache 2.0
134
  • Linux
  • Apple

Server Configuration

Describes the environment variables required to run the server.

NameRequiredDescriptionDefault
DBT_HOSTNoYour dbt Cloud instance hostname. This will look like an 'Access URL'. If you are using Multi-cell, do not include the 'ACCOUNT_PREFIX' herecloud.getdbt.com
DBT_PATHNoThe path to your dbt Core or dbt Cloud CLI executable. You can find your dbt executable by running 'which dbt'
DBT_TOKENNoYour personal access token or service token. Service token is required when using the Semantic Layer
DBT_USER_IDNoYour dbt Cloud user ID
DBT_DEV_ENV_IDNoYour dbt Cloud development environment ID
DISABLE_REMOTENoSet this to 'false' to enable remote MCP objectstrue
DBT_PROD_ENV_IDNoYour dbt Cloud production environment ID
DBT_PROJECT_DIRNoThe path to your dbt Project
DISABLE_DBT_CLINoSet this to 'true' to disable dbt Core and dbt Cloud CLI MCP toolsfalse
DISABLE_DISCOVERYNoSet this to 'true' to disable dbt Discovery API MCP objectsfalse
DISABLE_SEMANTIC_LAYERNoSet this to 'true' to disable dbt Semantic Layer MCP objectsfalse
MULTICELL_ACCOUNT_PREFIXNoIf you are using Multi-cell, set this to your 'ACCOUNT_PREFIX'. If you are not using Multi-cell, do not set this environment variable.

Schema

Prompts

Interactive templates invoked by user choice

NameDescription

No prompts

Resources

Contextual data attached and managed by the client

NameDescription

No resources

Tools

Functions exposed to the LLM to take actions

NameDescription
list_metrics

List all metrics from the dbt Semantic Layer.

If the user is asking a data-related or business-related question, this tool should be used as a first step to get a list of metrics that can be used with other tools to answer the question.

Examples:

  • "What are the top 5 products by revenue?"
  • "How many users did we have last month?"
get_dimensions
<instructions> Get the dimensions for specified metrics

Dimensions are the attributes, features, or characteristics that describe or categorize data. </instructions>

<parameters> metrics: List of metric names </parameters>
get_entities
<instructions> Get the entities for specified metrics

Entities are real-world concepts in a business such as customers, transactions, and ad campaigns. Analysis is often focused around specific entities, such as customer churn or annual recurring revenue modeling. </instructions>

<parameters> metrics: List of metric names </parameters>
query_metrics
<instructions> Queries the dbt Semantic Layer to answer business questions from the data warehouse.

This tool allows ordering and grouping by dimensions and entities. To use this tool, you must first know about specific metrics, dimensions and entities to provide. You can call the list_metrics, get_dimensions, and get_entities tools to get information about which metrics, dimensions, and entities to use.

When using the order_by parameter, you must ensure that the dimension or entity also appears in the group_by parameter. When fulfilling a lookback query, prefer using order_by and limit instead of using the where parameter. A lookback query requires that the order_by parameter includes a descending order for a time dimension.

The where parameter should be database agnostic SQL syntax, however dimensions and entity are referenced differently. For categorical dimensions, use {{ Dimension('<name>') }} and for time dimensions add the grain like {{ TimeDimension('<name>', '<grain>') }}. For entities, use {{ Entity('<name>') }}. When referencing dates in the where parameter, only use the format yyyy-mm-dd.

Don't call this tool if the user's question cannot be answered with the provided metrics, dimensions, and entities. Instead, clarify what metrics, dimensions, and entities are available and suggest a new question that can be answered and is approximately the same as the user's question.

For queries that may return large amounts of data, it's recommended to use a two-step approach:

  1. First make a query with a small limit to verify the results are what you expect
  2. Then make a follow-up query without a limit (or with a larger limit) to get the full dataset</instructions>
<examples> <example> Question: "What were our total sales last month?" Thinking step-by-step: - I know "total_sales" is the metric I need - I know "metric_time" is a valid dimension for this metric and supports MONTH grain - I need to group by metric_time to get monthly data - Since this is time-based data, I should order by metric_time. I am also grouping by metric_time, so this is valid. - The user is asking for a lookback query, so I should set descending to true so the most recent month is at the top of the results. - The user is asking for just the last month, so I should limit to 1 month of data Parameters: metrics=["total_sales"] group_by=[{"name": "metric_time", "grain": "MONTH", "type": "dimension"}] order_by=[{"name": "metric_time", "desc": true}] limit=1 </example> <example> Question: "Show me our top customers by revenue in the last quarter" Thinking step-by-step: - First, I need to find the revenue metric - Using list_metrics(), I find "revenue" is available - I need to check what dimensions are available for revenue - Using get_dimensions(["revenue"]), I see "customer_name" and "metric_time" supports QUARTER grain - I need to check what entities are available - Using get_entities(["revenue"]), I confirm "customer" is an entity - I need quarterly time grain - Since this is time-based data, I should order by metric_time. I am grouping by metric_time, so this is valid. This is a lookback query, so I should set descending to true. - I should also order by revenue to see top customers. I am grouping by revenue, so this is valid. The user is asking for the highest revenue customers, so I should set descending to true. - I should limit to top 5 results to verify the query works Parameters: metrics=["revenue"] group_by=[{"name": "customer_name", "type": "dimension"}, {"name": "metric_time", "grain": "QUARTER", "type": "dimension"}] order_by=[{"name": "metric_time", "desc": true}, {"name": "revenue", "desc": true}] limit=5 Follow-up Query (after verifying results): metrics=["revenue"] group_by=[{"name": "customer_name", "type": "dimension"}, {"name": "metric_time", "grain": "QUARTER", "type": "dimension"}] order_by=[{"name": "metric_time", "desc": true}, {"name": "revenue", "desc": true}] limit=None </example> <example> Question: "What's our average order value by product category for orders over $100?" Thinking step-by-step: - I know "average_order_value" is the metric I need - I know "product_category" is a valid dimension - I need to filter for orders over $100 - No time dimension needed - I should first limit results to verify the query works Parameters (initial query): metrics=["average_order_value"] group_by=[{"name": "product_category", "type": "dimension"}] where="{{ Dimension('order_value') }} > 100" limit=10 Follow-up Query (after verifying results): metrics=["average_order_value"] group_by=[{"name": "product_category", "type": "dimension"}] where="{{ Dimension('order_value') }} > 100" limit=None </example> <example> Question: "How many new users did we get each week last year?" Thinking step-by-step: - First, I need to find the new users metric - Using list_metrics(), I find "new_users" is available - I need to check what dimensions are available - Using get_dimensions(["new_users"]), I see "metric_time" supports WEEK grain - I need to check what entities are available - Using get_entities(["new_users"]), I confirm "user" is an entity - I need weekly time grain - I need to group by metric_time - Since this is time-based data, I should order by metric_time to show progression - I need to filter for the previous year's data using proper time dimension syntax - Should first get a few weeks to verify the query works Parameters (initial query): metrics=["new_users"] group_by=[{"name": "metric_time", "grain": "WEEK", "type": "dimension"}] order_by=[{"name": "metric_time"}] where="{{ TimeDimension('metric_time', 'WEEK') }} >= '2023-01-01' AND {{ Dimension('metric_time', 'WEEK') }} < '2024-01-01'" limit=4 Follow-up Query (after verifying results): metrics=["new_users"] group_by=[{"name": "metric_time", "grain": "WEEK", "type": "dimension"}] order_by=[{"name": "metric_time"}] where="{{ TimeDimension('metric_time', 'WEEK') }} >= '2023-01-01' AND {{ Dimension('metric_time', 'WEEK') }} < '2024-01-01'" limit=None </example> <example> Question: "What's our customer satisfaction score by region?" Thinking step-by-step: - First, I need to check if we have a customer satisfaction metric - Using list_metrics(), I find we don't have a direct "customer_satisfaction" metric - I should check what metrics we do have that might be related - I see we have "net_promoter_score" and "customer_retention_rate" - I should inform the user that we don't have a direct customer satisfaction metric - I can suggest using NPS as a proxy for customer satisfaction Response to user: "I don't have a direct customer satisfaction metric, but I can show you Net Promoter Score (NPS) by region, which is often used as a proxy for customer satisfaction. Would you like to see that instead?" If user agrees, then: Parameters: metrics=["net_promoter_score"] group_by=[{"name": "region", "type": "dimension"}] order_by=[{"name": "net_promoter_score", "desc": true}] limit=10 </example> </examples> <parameters> metrics: List of metric names to query for. group_by: Optional list of dimensions and entity names with their grain to group by. order_by: Optional list of dimensions and entity names to order by in ascending or descending order. where: Optional SQL WHERE clause to filter results. limit: Optional limit for number of results. </parameters>
get_mart_models

Get the name and description of all mart models in the environment. A mart model is part of the presentation layer of the dbt project. It's where cleaned, transformed data is organized for consumption by end-users, like analysts, dashboards, or business tools.

get_all_models

Get the name and description of all dbt models in the environment.

get_model_details
<instructions> Retrieves information about a specific dbt model. Specifically, it returns the compiled sql, description, column names, column descriptions, and column types. </instructions> <parameters> model_name: The name of the dbt model to retrieve details for. </parameters>
get_model_parents

Get the parents of a specific dbt model.

build

The dbt build command will:

  • run models
  • test tests
  • snapshot snapshots
  • seed seeds

In DAG order.

compile

dbt compile generates executable SQL from source model, test, and analysis files.

The compile command is useful for visually inspecting the compiled output of model files. This is useful for validating complex jinja logic or macro usage.

docs

The docs command is responsible for generating your project's documentation website.

list

List the resources in the your dbt project.

parse

The dbt parse command parses and validates the contents of your dbt project. If your project contains Jinja or YAML syntax errors, the command will fail.

It will also produce an artifact with detailed timing information, which is useful to understand parsing times for large projects.

run

dbt run executes compiled sql model files against the current target database. dbt connects to the target database and runs the relevant SQL required to materialize all data models using the specified materialization strategies. Models are run in the order defined by the dependency graph generated during compilation.

test

dbt test runs data tests defined on models, sources, snapshots, and seeds and unit tests defined on SQL models.

show

dbt show executes an arbitrary SQL statement against the database and returns the results. It is useful for debugging and inspecting data in your dbt project. Use the limit argument in place of a SQL LIMIT clause

ID: 4g56yk3tar