# dbt MCP Server Integration for Claude Code
Comprehensive guide for implementing and using the dbt Model Context Protocol (MCP) server with Claude Code for intelligent data transformation assistance.
## Overview
The dbt MCP server provides Claude Code with native dbt project capabilities, enabling intelligent assistance for:
- Model development and compilation
- Data quality testing
- Project discovery and metadata analysis
- Database querying and exploration
## Architecture
```
Claude Code ←→ MCP Protocol ←→ dbt MCP Server ←→ dbt Core ←→ DuckDB
↓
Local dbt Project
(transform/ directory)
```
### Components
1. **dbt MCP Server** (`mcp_servers/dbt_server.py`)
- Protocol-compliant MCP 1.0 server
- Async/await pattern for all operations
- Comprehensive error handling and logging
- Type safety with Pydantic models
2. **Configuration** (`.env.dbt-mcp`)
- Environment-specific settings
- Tool group enablement/disablement
- DuckDB integration parameters
3. **Claude Configuration** (`mcp_servers/claude_config.json`)
- MCP server registration for Claude Code
- Environment variable configuration
- Server command and arguments
## Tool Groups
### dbt CLI Tools (`DBT_MCP_ENABLE_CLI_TOOLS=true`)
| Tool | Description | Arguments |
|------|-------------|-----------|
| `dbt_run` | Execute dbt models | `models`, `full_refresh` |
| `dbt_test` | Run data quality tests | `models` |
| `dbt_compile` | Compile models to SQL | `models` |
| `dbt_build` | Run models, tests, seeds in DAG order | `models` |
### Discovery Tools (`DBT_MCP_ENABLE_DISCOVERY_TOOLS=true`)
| Tool | Description | Arguments |
|------|-------------|-----------|
| `discovery_list_models` | List all project models with metadata | `filter` |
| `discovery_model_details` | Get detailed model information | `model_name` |
| `discovery_lineage` | Get data lineage and dependencies | `model_name` |
### Remote Tools (`DBT_MCP_ENABLE_REMOTE_TOOLS=true`)
| Tool | Description | Arguments |
|------|-------------|-----------|
| `remote_query_database` | Execute SQL against DuckDB | `sql`, `limit` |
| `remote_describe_table` | Get table schema information | `table_name`, `schema` |
## Installation & Setup
### 1. Dependencies Installation
```bash
# Install with virtual environment active
source venv/bin/activate
pip install dbt-mcp mcp
```
### 2. Environment Configuration
Configure `.env.dbt-mcp`:
```bash
# dbt Core Configuration
DBT_PROJECT_DIR=/Users/ajdoyle/claude-data-stack-mcp/transform
DBT_PROFILES_DIR=/Users/ajdoyle/claude-data-stack-mcp/transform/profiles/duckdb
DBT_PROFILE=data_stack
DBT_TARGET=dev
# Tool Group Configuration
DBT_MCP_ENABLE_CLI_TOOLS=true
DBT_MCP_ENABLE_DISCOVERY_TOOLS=true
DBT_MCP_ENABLE_REMOTE_TOOLS=true
```
### 3. Claude Code Configuration
Add to Claude Code MCP configuration:
```json
{
"mcpServers": {
"dbt-data-stack": {
"command": "python",
"args": ["/path/to/mcp_servers/dbt_server.py"],
"env": { /* environment variables */ }
}
}
}
```
## Usage Examples
### Model Development
```
Claude: Run dbt models to build the latest transformations
Tool: dbt_run
Result: Models compiled and executed successfully
```
### Data Quality Testing
```
Claude: Test the employee dimension model for data quality issues
Tool: dbt_test with models="dim_employees"
Result: All tests passed - unique constraints and not-null validations successful
```
### Project Discovery
```
Claude: What models are available in this dbt project?
Tool: discovery_list_models
Result: staging: stg_employees | marts: dim_employees, agg_department_stats
```
### Database Exploration
```
Claude: Show me the department statistics from the aggregated model
Tool: remote_query_database with sql="SELECT * FROM main.agg_department_stats"
Result: Department-level metrics with employee counts and salary statistics
```
## Protocol Compliance
### MCP 1.0 Specification
- ✅ **Async Operations**: All handlers use async/await pattern
- ✅ **Error Handling**: Comprehensive error handling with proper error responses
- ✅ **Type Safety**: Pydantic models for all data structures
- ✅ **Resource Management**: Proper resource listing and reading capabilities
- ✅ **Tool Management**: Dynamic tool registration based on configuration
- ✅ **Logging**: Structured logging for debugging and monitoring
### Backend Integration Features
- ✅ **DuckDB Integration**: Native DuckDB database connectivity
- ✅ **Local Project Management**: File system-based dbt project operations
- ✅ **Environment Validation**: Pre-flight checks for project structure
- ✅ **Security**: Input validation and SQL injection protection
- ✅ **Performance**: Timeout handling and resource limits
- ✅ **Monitoring**: Comprehensive logging and error tracking
## Security Considerations
### Input Validation
- SQL injection protection for database queries
- Path traversal prevention for file operations
- Argument validation for all tool inputs
### Resource Limits
- 5-minute timeout for dbt command execution
- 100-row default limit for database queries
- File size limits for model content reading
### Environment Isolation
- Dedicated virtual environment for dependencies
- Environment variable-based configuration
- No hardcoded credentials or paths
## Troubleshooting
### Common Issues
**"dbt project directory not found"**
- Verify `DBT_PROJECT_DIR` environment variable
- Ensure transform/ directory exists and contains dbt_project.yml
**"Database file not found"**
- Run Meltano ELT: `meltano run tap-csv target-duckdb`
- Verify `DUCKDB_PATH` points to correct database file
**"Tool not found"**
- Check tool group configuration (CLI_TOOLS, DISCOVERY_TOOLS, etc.)
- Verify MCP server startup logs for tool registration
### Debug Mode
Enable debug logging:
```bash
export MCP_DEBUG=true
export DBT_LOG_LEVEL=debug
```
### Validation Commands
```bash
# Test MCP server startup
./scripts/start_dbt_mcp.sh
# Validate dbt configuration
cd transform
DBT_PROFILES_DIR=./profiles/duckdb dbt debug --profile data_stack
# Test database connectivity
python -c "import duckdb; conn = duckdb.connect('data/warehouse/data_stack.duckdb'); print(conn.execute('SHOW TABLES').fetchall())"
```
## Performance Optimization
### Caching Strategies
- Model metadata caching for discovery operations
- Connection pooling for database operations
- Compiled SQL caching for repeated operations
### Resource Management
- Automatic connection cleanup
- Memory-efficient result streaming
- Timeout-based resource release
### Monitoring
- Execution time tracking for all operations
- Error rate monitoring and alerting
- Resource utilization metrics
## Next Steps
1. **Enhanced Lineage Analysis**: Implement manifest.json parsing for complete lineage
2. **Semantic Layer Integration**: Add dbt Cloud semantic layer support
3. **Advanced Testing**: Integration testing with Claude Code
4. **Performance Monitoring**: Add metrics collection and dashboards
5. **Documentation Generation**: Automated model documentation updates
## Support
- **Issues**: Report at project GitHub repository
- **Logs**: Check MCP server logs for detailed error information
- **Documentation**: Refer to dbt Core and MCP protocol documentation
- **Community**: Join dbt Slack community for additional support