dbt MCP Server

README.md•6.77 kB

# Modern Data Stack with Meltano, DuckDB, and dbt Core A production-ready data stack implementation combining Meltano for ELT orchestration, DuckDB for high-performance analytics, and dbt Core for data transformations. ## Architecture Overview ``` CSV Data → Meltano (Extract/Load) → DuckDB → dbt Core (Transform) → Analytics Tables ``` **Data Flow**: 1. **Extract**: Meltano's `tap-csv` reads sample employee data 2. **Load**: `target-duckdb` loads raw data into DuckDB 3. **Transform**: dbt models create staging views and analytics tables 4. **Validate**: Data quality tests ensure integrity ## Tool Versions (Latest Compatible) - **Meltano 3.8.0** - DataOps platform for ELT pipelines - **DuckDB 1.3.2** - High-performance in-process analytics database - **dbt Core 1.10.4** - Data transformation framework - **dbt-duckdb 1.9.4** - DuckDB adapter for dbt ## Quick Start ### 1. Environment Setup ```bash # Clone and navigate to project cd claude-data-stack-mcp # Create and activate virtual environment python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate # Install all dependencies pip install -r requirements.txt ``` ### 2. Run Complete Pipeline ```bash # Step 1: Extract and Load with Meltano meltano run tap-csv target-duckdb # Step 2: Transform with dbt cd transform DBT_PROFILES_DIR=./profiles/duckdb dbt run --profile data_stack --project-dir . # Step 3: Validate with data quality tests DBT_PROFILES_DIR=./profiles/duckdb dbt test --profile data_stack --project-dir . ``` ### 3. Verify Results ```bash # Check transformed data python -c " import duckdb conn = duckdb.connect('data/warehouse/data_stack.duckdb') print('=== Department Stats ===') result = conn.execute('SELECT * FROM main.agg_department_stats').fetchall() for row in result: print(row) " ``` ## Project Structure ``` ├── data/ │ ├── sample_data.csv # Sample employee data │ └── warehouse/ # DuckDB database files ├── transform/ # dbt project │ ├── models/ │ │ ├── staging/ # Staging models │ │ │ ├── stg_employees.sql │ │ │ └── sources.yml │ │ └── marts/ # Analytics models │ │ ├── dim_employees.sql │ │ └── agg_department_stats.sql │ ├── profiles/duckdb/ # Project-contained profiles │ └── dbt_project.yml ├── meltano.yml # Meltano configuration ├── requirements.txt # Python dependencies └── README.md # This file ``` ## Data Models ### Staging Layer - **`stg_employees`**: Clean, typed employee data from raw CSV ### Analytics Layer - **`dim_employees`**: Employee dimension with salary tiers (Junior/Mid-Level/Senior) - **`agg_department_stats`**: Department-level aggregations (count, avg/min/max salary, total payroll) ### Data Quality Tests - **Unique constraints**: Employee IDs must be unique - **Not null constraints**: Employee IDs cannot be null ## Usage Examples ### Add New Data Sources ```bash # Browse available extractors meltano discover extractors # Add a new extractor (e.g., PostgreSQL) meltano add extractor tap-postgres # Configure in meltano.yml and run meltano run tap-postgres target-duckdb ``` ### Create New dbt Models ```sql -- transform/models/marts/new_model.sql {{ config(materialized='table') }} select department, count(*) as employee_count, avg(annual_salary) as avg_salary from {{ ref('stg_employees') }} group by department ``` ### Development Workflow ```bash # Test individual dbt models cd transform DBT_PROFILES_DIR=./profiles/duckdb dbt run --models stg_employees --profile data_stack --project-dir . # Run only marts models DBT_PROFILES_DIR=./profiles/duckdb dbt run --models marts --profile data_stack --project-dir . # Generate documentation DBT_PROFILES_DIR=./profiles/duckdb dbt docs generate --profile data_stack --project-dir . ``` ## Configuration ### Meltano Configuration - **Extractor**: `tap-csv` configured for `data/sample_data.csv` - **Loader**: `target-duckdb` configured for `data/warehouse/data_stack.duckdb` - **Environments**: dev, staging, prod ### dbt Configuration - **Profile**: `data_stack` with project-contained profiles - **Target**: DuckDB database in `data/warehouse/` - **Materializations**: Views for staging, tables for marts ## Troubleshooting ### Common Issues **"Table does not exist" errors**: - Ensure Meltano ELT step completed successfully - Check `data/warehouse/data_stack.duckdb` exists **dbt profile errors**: - Verify you're in the `transform/` directory - Use `DBT_PROFILES_DIR=./profiles/duckdb` flag **Python dependency conflicts**: - Use fresh virtual environment - Ensure Python 3.13+ compatibility ### Validation Commands ```bash # Check Meltano configuration meltano config list # Validate dbt setup cd transform DBT_PROFILES_DIR=./profiles/duckdb dbt debug --profile data_stack # Inspect DuckDB directly python -c "import duckdb; conn = duckdb.connect('data/warehouse/data_stack.duckdb'); print(conn.execute('SHOW TABLES').fetchall())" ``` ## Next Steps 1. **Add More Data Sources**: Integrate APIs, databases, or files using Meltano's extensive extractor library 2. **Expand Transformations**: Create more sophisticated dbt models for advanced analytics 3. **Add Orchestration**: Integrate with Airflow, Prefect, or other orchestration tools 4. **Enable Monitoring**: Add data quality monitoring and alerting 5. **Scale Storage**: Migrate to cloud data warehouses (Snowflake, BigQuery, etc.) ## MCP Integration **NEW**: Claude Code MCP server for intelligent dbt assistance! ```bash # Start dbt MCP server for Claude Code integration ./scripts/start_dbt_mcp.sh ``` **Capabilities**: - **dbt CLI Operations**: `dbt_run`, `dbt_test`, `dbt_compile`, `dbt_build` - **Project Discovery**: Model listing, metadata analysis, lineage exploration - **Database Querying**: Direct SQL execution against DuckDB warehouse - **Real-time Assistance**: Context-aware dbt project support **Documentation**: - 🚀 [Quick Start Guide](docs/QUICK_START.md) - 5-minute setup - 📚 [API Reference](docs/API_REFERENCE.md) - Complete tool documentation - 💡 [Usage Examples](docs/USAGE_EXAMPLES.md) - Practical workflows - 🔧 [Integration Guide](docs/MCP_INTEGRATION.md) - Detailed configuration ## Status: ✅ Production Ready + MCP Enhanced This data stack has been systematically implemented and validated through comprehensive testing. All components are using the latest compatible versions and following best practices. Enhanced with Claude Code MCP integration for intelligent development assistance.

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/ajdoyl2/claude-data-stack-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server