Skip to main content
Glama
ajdoyl2

dbt MCP Server

by ajdoyl2
MCP_INTEGRATION.md7.45 kB
# dbt MCP Server Integration for Claude Code Comprehensive guide for implementing and using the dbt Model Context Protocol (MCP) server with Claude Code for intelligent data transformation assistance. ## Overview The dbt MCP server provides Claude Code with native dbt project capabilities, enabling intelligent assistance for: - Model development and compilation - Data quality testing - Project discovery and metadata analysis - Database querying and exploration ## Architecture ``` Claude Code ←→ MCP Protocol ←→ dbt MCP Server ←→ dbt Core ←→ DuckDB ↓ Local dbt Project (transform/ directory) ``` ### Components 1. **dbt MCP Server** (`mcp_servers/dbt_server.py`) - Protocol-compliant MCP 1.0 server - Async/await pattern for all operations - Comprehensive error handling and logging - Type safety with Pydantic models 2. **Configuration** (`.env.dbt-mcp`) - Environment-specific settings - Tool group enablement/disablement - DuckDB integration parameters 3. **Claude Configuration** (`mcp_servers/claude_config.json`) - MCP server registration for Claude Code - Environment variable configuration - Server command and arguments ## Tool Groups ### dbt CLI Tools (`DBT_MCP_ENABLE_CLI_TOOLS=true`) | Tool | Description | Arguments | |------|-------------|-----------| | `dbt_run` | Execute dbt models | `models`, `full_refresh` | | `dbt_test` | Run data quality tests | `models` | | `dbt_compile` | Compile models to SQL | `models` | | `dbt_build` | Run models, tests, seeds in DAG order | `models` | ### Discovery Tools (`DBT_MCP_ENABLE_DISCOVERY_TOOLS=true`) | Tool | Description | Arguments | |------|-------------|-----------| | `discovery_list_models` | List all project models with metadata | `filter` | | `discovery_model_details` | Get detailed model information | `model_name` | | `discovery_lineage` | Get data lineage and dependencies | `model_name` | ### Remote Tools (`DBT_MCP_ENABLE_REMOTE_TOOLS=true`) | Tool | Description | Arguments | |------|-------------|-----------| | `remote_query_database` | Execute SQL against DuckDB | `sql`, `limit` | | `remote_describe_table` | Get table schema information | `table_name`, `schema` | ## Installation & Setup ### 1. Dependencies Installation ```bash # Install with virtual environment active source venv/bin/activate pip install dbt-mcp mcp ``` ### 2. Environment Configuration Configure `.env.dbt-mcp`: ```bash # dbt Core Configuration DBT_PROJECT_DIR=/Users/ajdoyle/claude-data-stack-mcp/transform DBT_PROFILES_DIR=/Users/ajdoyle/claude-data-stack-mcp/transform/profiles/duckdb DBT_PROFILE=data_stack DBT_TARGET=dev # Tool Group Configuration DBT_MCP_ENABLE_CLI_TOOLS=true DBT_MCP_ENABLE_DISCOVERY_TOOLS=true DBT_MCP_ENABLE_REMOTE_TOOLS=true ``` ### 3. Claude Code Configuration Add to Claude Code MCP configuration: ```json { "mcpServers": { "dbt-data-stack": { "command": "python", "args": ["/path/to/mcp_servers/dbt_server.py"], "env": { /* environment variables */ } } } } ``` ## Usage Examples ### Model Development ``` Claude: Run dbt models to build the latest transformations Tool: dbt_run Result: Models compiled and executed successfully ``` ### Data Quality Testing ``` Claude: Test the employee dimension model for data quality issues Tool: dbt_test with models="dim_employees" Result: All tests passed - unique constraints and not-null validations successful ``` ### Project Discovery ``` Claude: What models are available in this dbt project? Tool: discovery_list_models Result: staging: stg_employees | marts: dim_employees, agg_department_stats ``` ### Database Exploration ``` Claude: Show me the department statistics from the aggregated model Tool: remote_query_database with sql="SELECT * FROM main.agg_department_stats" Result: Department-level metrics with employee counts and salary statistics ``` ## Protocol Compliance ### MCP 1.0 Specification - ✅ **Async Operations**: All handlers use async/await pattern - ✅ **Error Handling**: Comprehensive error handling with proper error responses - ✅ **Type Safety**: Pydantic models for all data structures - ✅ **Resource Management**: Proper resource listing and reading capabilities - ✅ **Tool Management**: Dynamic tool registration based on configuration - ✅ **Logging**: Structured logging for debugging and monitoring ### Backend Integration Features - ✅ **DuckDB Integration**: Native DuckDB database connectivity - ✅ **Local Project Management**: File system-based dbt project operations - ✅ **Environment Validation**: Pre-flight checks for project structure - ✅ **Security**: Input validation and SQL injection protection - ✅ **Performance**: Timeout handling and resource limits - ✅ **Monitoring**: Comprehensive logging and error tracking ## Security Considerations ### Input Validation - SQL injection protection for database queries - Path traversal prevention for file operations - Argument validation for all tool inputs ### Resource Limits - 5-minute timeout for dbt command execution - 100-row default limit for database queries - File size limits for model content reading ### Environment Isolation - Dedicated virtual environment for dependencies - Environment variable-based configuration - No hardcoded credentials or paths ## Troubleshooting ### Common Issues **"dbt project directory not found"** - Verify `DBT_PROJECT_DIR` environment variable - Ensure transform/ directory exists and contains dbt_project.yml **"Database file not found"** - Run Meltano ELT: `meltano run tap-csv target-duckdb` - Verify `DUCKDB_PATH` points to correct database file **"Tool not found"** - Check tool group configuration (CLI_TOOLS, DISCOVERY_TOOLS, etc.) - Verify MCP server startup logs for tool registration ### Debug Mode Enable debug logging: ```bash export MCP_DEBUG=true export DBT_LOG_LEVEL=debug ``` ### Validation Commands ```bash # Test MCP server startup ./scripts/start_dbt_mcp.sh # Validate dbt configuration cd transform DBT_PROFILES_DIR=./profiles/duckdb dbt debug --profile data_stack # Test database connectivity python -c "import duckdb; conn = duckdb.connect('data/warehouse/data_stack.duckdb'); print(conn.execute('SHOW TABLES').fetchall())" ``` ## Performance Optimization ### Caching Strategies - Model metadata caching for discovery operations - Connection pooling for database operations - Compiled SQL caching for repeated operations ### Resource Management - Automatic connection cleanup - Memory-efficient result streaming - Timeout-based resource release ### Monitoring - Execution time tracking for all operations - Error rate monitoring and alerting - Resource utilization metrics ## Next Steps 1. **Enhanced Lineage Analysis**: Implement manifest.json parsing for complete lineage 2. **Semantic Layer Integration**: Add dbt Cloud semantic layer support 3. **Advanced Testing**: Integration testing with Claude Code 4. **Performance Monitoring**: Add metrics collection and dashboards 5. **Documentation Generation**: Automated model documentation updates ## Support - **Issues**: Report at project GitHub repository - **Logs**: Check MCP server logs for detailed error information - **Documentation**: Refer to dbt Core and MCP protocol documentation - **Community**: Join dbt Slack community for additional support

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/ajdoyl2/claude-data-stack-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server