# Claude Code Integration Guide - Modern Data Stack with MCP
This document provides comprehensive guidance for Claude Code to understand and interact with the modern data stack through specialized MCP server integration.
## ποΈ Architecture Overview
### Production-Ready Data Stack
This platform implements a **production-ready modern data stack** combining:
**Core Components**:
- **Meltano**: ELT orchestration platform for data pipelines
- **DuckDB**: High-performance in-process analytics database
- **dbt Core**: Data transformation framework with best practices
- **MCP Integration**: Claude Code server for intelligent development assistance
**Architecture Flow**:
```
CSV Data β Meltano (Extract/Load) β DuckDB β dbt Core (Transform) β Analytics Tables
β
Claude Code MCP Server (Intelligent Development Assistance)
```
## π MCP Server Capabilities
### dbt MCP Server Integration
**Primary Server**: `dbt-data-stack`
**Location**: `/Users/ajdoyle/claude-data-stack-mcp/mcp_servers/dbt_server.py`
**Available Tools**:
- `dbt_run`: Execute dbt models and transformations
- `dbt_test`: Run data quality tests and validations
- `dbt_compile`: Compile models without execution for validation
- `dbt_build`: Complete build including models and tests
- `dbt_list`: Project discovery and model metadata
- `dbt_show`: Execute arbitrary SQL queries against warehouse
**Configuration**:
```json
{
"mcpServers": {
"dbt-data-stack": {
"command": "python",
"args": ["/Users/ajdoyle/claude-data-stack-mcp/mcp_servers/dbt_server.py"],
"env": {
"DBT_PROJECT_DIR": "/Users/ajdoyle/claude-data-stack-mcp/transform",
"DBT_PROFILES_DIR": "/Users/ajdoyle/claude-data-stack-mcp/transform/profiles/duckdb",
"DBT_PROFILE": "data_stack",
"DBT_TARGET": "dev",
"DUCKDB_PATH": "/Users/ajdoyle/claude-data-stack-mcp/data/warehouse/data_stack.duckdb"
}
}
}
}
```
## π Data Platform Structure
### Key Directories
```
βββ data/
β βββ sample_data.csv # Sample datasets for pipeline testing
β βββ warehouse/ # DuckDB database files
β βββ data_stack.duckdb # Main analytics warehouse
βββ transform/ # dbt project root
β βββ models/ # dbt models
β β βββ staging/ # Raw data cleaning
β β β βββ stg_employees.sql
β β β βββ sources.yml
β β βββ marts/ # Business logic
β β βββ dim_employees.sql
β β βββ agg_department_stats.sql
β βββ profiles/duckdb/ # Project-contained profiles
β β βββ profiles.yml
β βββ dbt_project.yml # dbt configuration
βββ mcp_servers/ # MCP server implementations
β βββ dbt_server.py # Claude Code dbt integration
β βββ claude_config.json # MCP server configuration
βββ meltano.yml # ELT pipeline configuration
```
## π§ Usage Patterns
### Data Pipeline Operations
```bash
# Complete data pipeline execution
meltano run tap-csv target-duckdb
cd transform && DBT_PROFILES_DIR=./profiles/duckdb dbt run --profile data_stack
```
### MCP-Enabled Development Workflows
**Through Claude Code**:
- **Model Development**: "Create a new dbt model for customer analytics"
- **Data Quality**: "Add data quality tests for the employee dimension"
- **Query Optimization**: "Analyze and optimize this SQL query performance"
- **Pipeline Debugging**: "Troubleshoot this dbt test failure"
### Direct MCP Tool Usage
```python
# Execute dbt models
await mcp_dbt.dbt_run(selector="marts", full_refresh=False)
# Run data quality tests
await mcp_dbt.dbt_test(selector="staging")
# Execute ad-hoc SQL queries
await mcp_dbt.dbt_show(
sql_query="SELECT department, COUNT(*) FROM {{ ref('stg_employees') }} GROUP BY department",
limit=100
)
```
## π Data Models & Quality
### Staging Layer
- **`stg_employees`**: Clean, typed employee data from raw CSV source
### Analytics Layer
- **`dim_employees`**: Employee dimension with calculated salary tiers
- **`agg_department_stats`**: Department-level metrics and aggregations
### Data Quality Framework
- **Unique constraints**: Employee IDs must be unique across all records
- **Not null constraints**: Critical fields cannot contain null values
- **Referential integrity**: Foreign key relationships maintained
## π Quick Start for Claude Code
### 1. MCP Server Activation
The dbt MCP server is automatically configured in `mcp_servers/claude_config.json`. Claude Code will detect and connect to the server automatically.
### 2. Common Operations
```
# Analyze data model relationships
"Show me the lineage for the dim_employees model"
# Execute transformations
"Run the marts layer models and validate data quality"
# Query the warehouse
"Show me department statistics from the analytics tables"
# Debug pipeline issues
"Why is the agg_department_stats model failing tests?"
```
### 3. Development Workflow
1. **Extract & Load**: Data flows from CSV through Meltano to DuckDB
2. **Transform**: dbt models create staging views and analytics tables
3. **Validate**: Data quality tests ensure integrity
4. **Query**: Ad-hoc analysis through Claude Code MCP integration
## π‘οΈ Configuration & Security
### Environment Variables
All sensitive configuration is handled through environment variables:
```env
DBT_PROJECT_DIR=/Users/ajdoyle/claude-data-stack-mcp/transform
DBT_PROFILES_DIR=/Users/ajdoyle/claude-data-stack-mcp/transform/profiles/duckdb
DUCKDB_PATH=/Users/ajdoyle/claude-data-stack-mcp/data/warehouse/data_stack.duckdb
```
### MCP Server Security
- **Local execution**: All MCP servers run locally for security
- **File system isolation**: Database files isolated in `data/warehouse/`
- **Profile isolation**: dbt profiles contained within project structure
## π Documentation & Support
### Available Documentation
- π [Quick Start Guide](docs/QUICK_START.md) - 5-minute setup
- π [API Reference](docs/API_REFERENCE.md) - Complete tool documentation
- π‘ [Usage Examples](docs/USAGE_EXAMPLES.md) - Practical workflows
- π§ [Integration Guide](docs/MCP_INTEGRATION.md) - Detailed configuration
### Tool Versions
- **Meltano 3.8.0** - DataOps platform for ELT pipelines
- **DuckDB 1.3.2** - High-performance analytics database
- **dbt Core 1.10.4** - Data transformation framework
- **dbt-duckdb 1.9.4** - DuckDB adapter for dbt
## β
Status: Production Ready + MCP Enhanced
This modern data stack has been systematically implemented and validated through comprehensive testing. All components use the latest compatible versions and follow industry best practices. Enhanced with Claude Code MCP integration for intelligent development assistance and automated workflows.
**Key Achievements**:
- β
Complete ELT pipeline implementation
- β
Data quality framework with automated testing
- β
MCP server integration for Claude Code assistance
- β
Project-contained configuration for reproducibility
- β
Production-ready with comprehensive documentation