Skip to main content
Glama
ajdoyl2

dbt MCP Server

by ajdoyl2
USAGE_EXAMPLES.mdβ€’15.8 kB
# dbt MCP Server Usage Examples Practical examples and use cases for the dbt MCP server with Claude Code integration. ## Table of Contents - [Getting Started](#getting-started) - [Basic Operations](#basic-operations) - [Model Development](#model-development) - [Data Quality Testing](#data-quality-testing) - [Database Exploration](#database-exploration) - [Project Discovery](#project-discovery) - [Advanced Workflows](#advanced-workflows) - [Troubleshooting](#troubleshooting) ## Getting Started ### Server Startup ```bash # Start the dbt MCP server ./scripts/start_dbt_mcp.sh # Expected output: # πŸš€ Starting dbt MCP Server for Claude Code... # βœ… Loaded dbt MCP configuration # πŸ“Š dbt MCP Server Configuration: # Project Dir: /path/to/transform # Profile: data_stack # Target: dev # 🎯 Starting MCP server... ``` ### Claude Code Integration Claude Code will automatically recognize the MCP server and provide intelligent dbt assistance: ``` User: "Show me what dbt models are available in this project" Claude: I'll list the dbt models for you. [Uses discovery_list_models tool] Found 3 dbt models in your project: - stg_employees (staging layer) - dim_employees (marts layer) - agg_department_stats (marts layer) ``` ## Basic Operations ### Running dbt Models **Scenario**: Execute all dbt models to build transformations ``` User: "Run all dbt models to build the latest data" Claude: I'll execute all dbt models using dbt run. [Uses dbt_run tool with no model filter] βœ… dbt run completed successfully - 3 models executed - All transformations built - Data warehouse updated ``` **Scenario**: Run specific models only ``` User: "Run just the staging models" Claude: I'll run only the staging models. [Uses dbt_run tool with models="stg_employees"] βœ… Staging models completed: - stg_employees: SUCCESS - Runtime: 1.2 seconds ``` ### Testing Data Quality **Scenario**: Run all data quality tests ``` User: "Test data quality for all models" Claude: I'll run the full dbt test suite. [Uses dbt_test tool] βœ… All tests passed: - source_unique_raw_employees_id: PASS - source_not_null_raw_employees_id: PASS - 2 tests completed in 1.8s ``` ## Model Development ### Complete Model Development Workflow **Scenario**: User wants to understand and work with a specific model ``` User: "I want to work on the employee staging model. Show me what it does and make sure it's working properly." Claude: I'll help you work with the employee staging model. Let me analyze it comprehensively. [Uses discovery_model_details tool for "stg_employees"] πŸ“Š Model Analysis: stg_employees - **Purpose**: Staging layer for employee data cleaning and typing - **Location**: staging/stg_employees.sql - **Type**: View materialization - **Dependencies**: raw.employees (source table) **Model Content**: ```sql {{ config(materialized='view') }} select cast(id as integer) as employee_id, name as employee_name, email as employee_email, cast(created_at as date) as hire_date, department, cast(salary as integer) as annual_salary from {{ source('raw', 'employees') }} ``` **Key Features**: - βœ… Type casting for data integrity - βœ… Column aliasing for clarity - βœ… Source reference for lineage [Uses dbt_compile tool for "stg_employees"] βœ… Model compiles successfully - SQL is valid [Uses dbt_run tool for "stg_employees"] βœ… Model executed successfully: - View created in DuckDB - 10 rows processed - Runtime: 0.8 seconds [Uses dbt_test tool for "stg_employees"] βœ… All tests passed: - Unique employee_id constraint: PASS - Not null employee_id constraint: PASS The stg_employees model is working perfectly! It cleanly transforms raw employee data with proper typing and passes all quality tests. ``` ### Model Compilation and Debugging **Scenario**: Check model SQL without running ``` User: "Compile the department stats model to see the generated SQL" Claude: I'll compile the model to show you the generated SQL. [Uses dbt_compile tool for "agg_department_stats"] βœ… Model compiled successfully. Here's the generated SQL: ```sql select department, count(*) as employee_count, avg(annual_salary) as avg_salary, min(annual_salary) as min_salary, max(annual_salary) as max_salary, sum(annual_salary) as total_payroll, current_timestamp as created_at from "main"."stg_employees" group by department order by avg_salary desc ``` The model aggregates employee data by department with salary statistics. ``` ## Data Quality Testing ### Targeted Testing **Scenario**: Test specific model after changes ``` User: "I just updated the staging model. Run tests to make sure it's still valid." Claude: I'll run tests specifically for the staging model. [Uses dbt_test tool with models="stg_employees"] βœ… Staging model tests completed: - source_unique_raw_employees_id: PASS βœ… - source_not_null_raw_employees_id: PASS βœ… All data quality checks passed! Your staging model changes are working correctly. ``` ### Full Project Testing **Scenario**: Comprehensive quality validation ``` User: "Run all tests to validate the entire data pipeline" Claude: I'll run the complete test suite. [Uses dbt_test tool] πŸ“Š Test Results Summary: βœ… All 4 tests passed: - Unique constraints: 2/2 passed - Not null constraints: 2/2 passed - Total runtime: 2.1 seconds Your data pipeline has excellent quality! All integrity constraints are satisfied. ``` ## Database Exploration ### Data Investigation **Scenario**: Explore table contents ``` User: "Show me some sample data from the employee dimension table" Claude: I'll query the employee dimension table for you. [Uses remote_query_database tool] πŸ“Š Sample data from dim_employees: | employee_id | employee_name | department | annual_salary | salary_tier | |-------------|---------------|------------|---------------|-------------| | 1 | John Doe | Engineering | 85000 | Mid-Level | | 2 | Jane Smith | Marketing | 72000 | Mid-Level | | 4 | Alice Brown | Engineering | 92000 | Senior | | 10 | Henry Chen | Engineering | 95000 | Senior | | 5 | Charlie Wilson | HR | 61000 | Junior | Rows returned: 5 of 10 total The table includes salary tiers (Junior: <70k, Mid-Level: 70k-90k, Senior: 90k+) calculated from annual salary. ``` ### Schema Analysis **Scenario**: Understand table structure ``` User: "What's the schema of the department stats table?" Claude: I'll describe the table structure for you. [Uses remote_describe_table tool for "agg_department_stats"] πŸ“‹ Table Schema: agg_department_stats | Column | Type | Description | |--------|------|-------------| | department | VARCHAR | Department name | | employee_count | BIGINT | Number of employees | | avg_salary | DOUBLE | Average salary | | min_salary | INTEGER | Minimum salary | | max_salary | INTEGER | Maximum salary | | total_payroll | BIGINT | Total payroll cost | | created_at | TIMESTAMP | Record creation time | This aggregation table provides department-level HR and financial metrics. ``` ### Data Analysis Queries **Scenario**: Business intelligence queries ``` User: "Which department has the highest average salary?" Claude: I'll analyze the department salary data. [Uses remote_query_database tool] πŸ“ˆ Department Salary Analysis: | department | avg_salary | employee_count | total_payroll | |------------|------------|----------------|---------------| | Engineering | 90000 | 4 | 360000 | | Marketing | 73500 | 2 | 147000 | | Sales | 69500 | 2 | 139000 | | HR | 60000 | 2 | 120000 | **Key Insights**: - πŸ₯‡ Engineering has the highest average salary at $90,000 - πŸ“Š Engineering also has the most employees (4 out of 10) - πŸ’° Engineering represents 47% of total payroll - πŸ“ˆ All departments have healthy salary distributions Engineering leads in both compensation and team size, which is common in tech organizations. ``` ## Project Discovery ### Project Overview **Scenario**: New user exploring the project ``` User: "I'm new to this project. Can you give me an overview of the dbt models?" Claude: I'll provide a comprehensive overview of your dbt project. [Uses discovery_list_models tool] πŸ“Š dbt Project Overview **Project Structure**: 3 models across 2 layers **Staging Layer** (1 model): - `stg_employees` - Clean, typed employee data from CSV source **Marts Layer** (2 models): - `dim_employees` - Employee dimension with salary tier classification - `agg_department_stats` - Department-level aggregations and metrics **Data Flow**: ``` raw.employees β†’ stg_employees β†’ {dim_employees, agg_department_stats} ``` **Key Features**: - βœ… Layered architecture (staging β†’ marts) - βœ… Data quality tests on source - βœ… Business logic in marts layer - βœ… Clear naming conventions This is a well-structured analytics project following dbt best practices! ``` ### Model Lineage Analysis **Scenario**: Understanding data dependencies ``` User: "Show me the data lineage for this project" Claude: I'll analyze the data lineage and dependencies. [Uses discovery_lineage tool] πŸ”— Data Lineage Analysis **Source**: raw.employees (CSV data) ↓ **Staging**: stg_employees (data cleaning & typing) ↓ **Marts**: - dim_employees (employee dimension) - agg_department_stats (department aggregations) **Dependency Graph**: - stg_employees depends on: raw.employees - dim_employees depends on: stg_employees - agg_department_stats depends on: stg_employees **Critical Path**: raw.employees β†’ stg_employees β†’ marts - Any changes to stg_employees affect both mart models - Source data updates flow through the entire pipeline - Well-designed single staging layer prevents duplication This linear dependency structure makes the pipeline easy to understand and maintain. ``` ## Advanced Workflows ### Complete Build Process **Scenario**: Full pipeline execution with validation ``` User: "Build the entire data pipeline from scratch and validate everything" Claude: I'll execute a complete build and validation workflow. [Uses dbt_build tool] πŸš€ Full Pipeline Build **Step 1: Building all models and tests** βœ… dbt build completed successfully: - Models: 3/3 built successfully - Tests: 4/4 passed - Total runtime: 3.2 seconds **Build Summary**: - stg_employees: BUILD SUCCESS βœ… - dim_employees: BUILD SUCCESS βœ… - agg_department_stats: BUILD SUCCESS βœ… **Test Results**: - source_unique_raw_employees_id: PASS βœ… - source_not_null_raw_employees_id: PASS βœ… [Uses remote_query_database tool for validation] **Data Validation**: - Source records: 10 employees - Staging records: 10 employees (100% match) - Dimension records: 10 employees (100% match) - Department aggregations: 4 departments 🎯 Pipeline Status: HEALTHY - βœ… All models built successfully - βœ… All tests passed - βœ… Data integrity maintained - βœ… No data loss detected Your data pipeline is fully operational and validated! ``` ### Performance Optimization **Scenario**: Analyzing and optimizing model performance ``` User: "The department stats model seems slow. Can you help optimize it?" Claude: I'll analyze the model and suggest optimizations. [Uses discovery_model_details tool for "agg_department_stats"] πŸ” Performance Analysis: agg_department_stats **Current Implementation**: ```sql select department, count(*) as employee_count, avg(annual_salary) as avg_salary, min(annual_salary) as min_salary, max(annual_salary) as max_salary, sum(annual_salary) as total_payroll, current_timestamp as created_at from {{ ref('stg_employees') }} group by department order by avg_salary desc ``` [Uses remote_query_database tool for analysis] **Performance Metrics**: - Source records: 10 rows - Aggregation groups: 4 departments - Query complexity: Low (simple GROUP BY) **Optimization Recommendations**: 1. **Current Performance**: βœ… GOOD - Small dataset (10 rows) processes quickly - Simple aggregation with minimal overhead 2. **Future Optimization** (for larger datasets): - Consider incremental materialization for large datasets - Add indexes on department column if needed - Use table materialization instead of view for complex downstream queries 3. **Current Status**: No optimization needed - Query executes in <200ms - Resource usage is minimal - Performance is excellent for current scale Your model is already well-optimized for the current dataset size! ``` ## Troubleshooting ### Model Compilation Issues **Scenario**: Model fails to compile ``` User: "My model isn't compiling. Can you help debug it?" Claude: I'll help debug the compilation issue. [Uses dbt_compile tool and gets error] ❌ Compilation Error Detected **Error Details**: ``` Exit code: 1 STDERR: Compilation Error in model 'problematic_model' column "nonexistent_column" does not exist ``` **Debugging Steps**: [Uses discovery_model_details tool for the model] πŸ” **Model Analysis**: - Found reference to `nonexistent_column` in line 8 - Column doesn't exist in upstream model [Uses remote_describe_table tool for upstream table] πŸ“‹ **Available Columns**: - employee_id, employee_name, department, annual_salary - Missing: nonexistent_column **Solution**: Update the model to use existing column names or add the column to the upstream model. **Fixed Model**: ```sql -- Change this: select nonexistent_column -- To this: select employee_name ``` Would you like me to help you fix the specific column reference? ``` ### Data Quality Issues **Scenario**: Tests are failing ``` User: "Some tests are failing. What's wrong?" Claude: I'll investigate the test failures. [Uses dbt_test tool and gets failures] ❌ Test Failures Detected **Failed Tests**: - source_unique_raw_employees_id: FAIL [Uses remote_query_database tool to investigate] πŸ” **Data Investigation**: ```sql SELECT employee_id, COUNT(*) as count FROM raw.employees GROUP BY employee_id HAVING COUNT(*) > 1 ``` **Results**: | employee_id | count | |-------------|-------| | 5 | 2 | **Issue**: Duplicate employee ID (5) found in source data **Resolution Options**: 1. **Fix Source Data**: Remove duplicate record 2. **Update Model**: Add deduplication logic 3. **Modify Test**: Change expectation if duplicates are valid **Recommended Fix**: ```sql -- Add deduplication to staging model select distinct cast(id as integer) as employee_id, ... from {{ source('raw', 'employees') }} ``` This will handle duplicates at the staging layer. ``` ### Performance Issues **Scenario**: Slow query execution ``` User: "The database queries are running slowly" Claude: I'll help diagnose the performance issue. [Uses remote_query_database tool with timing analysis] ⚑ Performance Diagnosis **Query Performance Test**: - Simple SELECT: 15ms βœ… Good - Aggregation query: 180ms βœ… Acceptable - Complex JOIN: 2.1s ⚠️ Slow **Potential Issues**: 1. **Database Size**: Check if database file is large 2. **Query Complexity**: Review complex queries 3. **Resource Limits**: Verify system resources [Uses discovery_list_models tool] **Project Size Analysis**: - 3 models (small project) - Expected performance: <200ms **Recommendations**: 1. **Restart Server**: `./scripts/start_dbt_mcp.sh` 2. **Check Database**: Verify DuckDB file isn't corrupted 3. **Monitor Resources**: Check system memory/CPU **Quick Fix**: ```bash # Restart MCP server ./scripts/start_dbt_mcp.sh # Verify database python -c "import duckdb; conn = duckdb.connect('data/warehouse/data_stack.duckdb'); print('Database OK')" ``` Performance should improve after restart. ``` --- These examples demonstrate the comprehensive capabilities of the dbt MCP server with Claude Code, showing how intelligent AI assistance can streamline data engineering workflows while maintaining professional standards and best practices.

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/ajdoyl2/claude-data-stack-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server