Which integrations are available for this server?

Integration with Databricks for data catalog and metadata management, enabling querying and searching of Databricks assets. Integration with dbt for data catalog and metadata management, enabling querying and searching of dbt models and lineage. Integration with Neo4j as an infrastructure adapter for state management and persistence. Integration with PostgreSQL as an infrastructure adapter for state management and persistence. Integration with Redis as an infrastructure adapter for state management and persistence. Integration with Snowflake for data catalog and metadata management, enabling querying and searching of Snowflake assets.

Data Workers

by DataWorkersProject

Overview Schema Related Servers Score Discussions

TypeScript

Local

What is Data Workers?

Data Workers is a coordinated swarm of AI agents that automate the full spectrum of data engineering workflows. Each agent is a standalone MCP (Model Context Protocol) server that exposes domain-specific tools to Claude Code, Codex CLI, OpenCode, Gemini CLI, Cursor, VS Code, and any MCP-compatible client.

The problem: Data engineers spend 60%+ of their time on undifferentiated work -- writing pipeline boilerplate, debugging data incidents at 2am, chasing schema changes across teams, manually cataloging assets, and fighting governance paperwork.

The solution: 11 autonomous agents that understand your data stack end-to-end. They build pipelines, detect anomalies, manage catalogs, enforce governance, track ML experiments, and more -- all through natural language via the MCP protocol your AI tools already speak.

Everything runs locally with in-memory stubs by default. No external services required. No data leaves your machine. BYO model -- use any LLM provider.

Related MCP server: bonnard

Get Started

Fastest path (2 commands)

No clone required — runs straight from npm:

npx dw-claw init
claude mcp add data-workers -- npx -y dw-claw

That's it. Open Claude Code and start asking questions. Everything works instantly with in-memory seed data.

Connect to your data (optional)

npx dw-claw setup    # Interactive: choose Snowflake/BigQuery/Databricks → enter creds → verified

Clone-based setup

If you prefer to clone the repo and run from source (required for contributing or dev):

git clone https://github.com/DataWorkersProject/dataworkers-claw-community.git
cd dataworkers-claw-community
npm install          # full install (~3min, includes optional warehouse SDKs)
# or:
npm install --ignore-optional    # fast install (~30s, skips heavy warehouse SDKs)

Then add agents to Claude Code (run from inside the cloned repo):

claude mcp add dw-pipelines -- "$(pwd)/start-agent.sh" dw-pipelines && \
claude mcp add dw-incidents -- "$(pwd)/start-agent.sh" dw-incidents && \
claude mcp add dw-catalog -- "$(pwd)/start-agent.sh" dw-context-catalog && \
claude mcp add dw-schema -- "$(pwd)/start-agent.sh" dw-schema && \
claude mcp add dw-quality -- "$(pwd)/start-agent.sh" dw-quality && \
claude mcp add dw-governance -- "$(pwd)/start-agent.sh" dw-governance && \
claude mcp add dw-usage -- "$(pwd)/start-agent.sh" dw-usage-intelligence && \
claude mcp add dw-observability -- "$(pwd)/start-agent.sh" dw-observability && \
claude mcp add dw-connectors -- "$(pwd)/start-agent.sh" dw-connectors && \
claude mcp add dw-ml -- "$(pwd)/start-agent.sh" dw-ml

Start Claude Code and ask:

"Search the catalog for customer-related tables"
"Show me the full lineage for the orders table"
"Why did the orders table row count drop 40% yesterday?"
"Scan the customer schema for PII and suggest masking policies"
"Compare the last two ML experiments and explain the accuracy difference"

Everything works instantly with in-memory seed data — no infrastructure required.

Client configuration

Each agent can be started via the start-agent.sh script, which handles working directory and dependency resolution. Replace /path/to/dataworkers-claw-community with your clone location.

Claude Code (.mcp.json in your project root):

{
  "mcpServers": {
    "dw-pipelines": {
      "command": "/path/to/dataworkers-claw-community/start-agent.sh",
      "args": ["dw-pipelines"]
    },
    "dw-catalog": {
      "command": "/path/to/dataworkers-claw-community/start-agent.sh",
      "args": ["dw-context-catalog"]
    },
    "dw-quality": {
      "command": "/path/to/dataworkers-claw-community/start-agent.sh",
      "args": ["dw-quality"]
    }
  }
}

Cursor (.cursor/mcp.json) — same format:

{
  "mcpServers": {
    "dw-pipelines": {
      "command": "/path/to/dataworkers-claw-community/start-agent.sh",
      "args": ["dw-pipelines"]
    },
    "dw-incidents": {
      "command": "/path/to/dataworkers-claw-community/start-agent.sh",
      "args": ["dw-incidents"]
    }
  }
}

OpenCode (opencode.json in your project root):

{
  "mcp": {
    "dw-pipelines": {
      "type": "local",
      "command": ["/path/to/dataworkers-claw-community/start-agent.sh", "dw-pipelines"],
      "enabled": true
    },
    "dw-catalog": {
      "type": "local",
      "command": ["/path/to/dataworkers-claw-community/start-agent.sh", "dw-context-catalog"],
      "enabled": true
    }
  }
}

Codex CLI — one-liner, or a project-scoped .codex/config.toml:

codex mcp add data-workers -- npx -y dw-claw
# or, in your project:
npx dw-claw init --client codex

Gemini CLI (.gemini/settings.json):

npx dw-claw init --client gemini

This repository ships pre-wired configs for Claude Code (.mcp.json), OpenCode (opencode.json), and Codex CLI (.codex/config.toml) — cloning it and opening your coding agent inside it is enough.

Claude Code plugin

This repo is also a Claude Code plugin marketplace — one install wires the unified MCP server plus three data-engineering skills (/data-workers:trace-lineage, /data-workers:quality-audit, /data-workers:incident-rca):

claude plugin marketplace add DataWorkersProject/dataworkers-claw-community
claude plugin install data-workers@dataworkers

Grok Build reads Claude Code plugins natively, so the same install works there unchanged.

Setup guides

Per-client walkthroughs live in docs/setup/: Claude Code · Codex CLI · OpenCode · Gemini CLI · Cursor · GitHub Copilot · OpenClaw, Cline & Continue · Microsoft Copilot

Agents

Agent	Package	Description	Tools
Pipelines	`dw-pipelines`	NL-to-pipeline generation, template engine, Iceberg MERGE INTO, Kafka events, Airflow deployment. Write tools (`generate_pipeline`, `deploy_pipeline`) require Pro.	4
Incidents	`dw-incidents`	Statistical anomaly detection, graph-based root cause analysis, playbook execution	5
Catalog	`dw-context-catalog`	Hybrid search (vector + BM25 + graph), lineage traversal, Iceberg crawler	35
Schema	`dw-schema`	INFORMATION_SCHEMA diffs, rename detection, Iceberg snapshot evolution	9
Quality	`dw-quality`	Weighted 5-dimension scoring, z-score anomaly detection, 14-day baselines	6
Governance	`dw-governance`	Priority-based policy engine, 3-pass PII scanner (regex + values + LLM)	6
Usage Intelligence	`dw-usage-intelligence`	Practitioner analytics, workflow patterns, adoption dashboards, heatmaps (zero LLM)	26
Observability	`dw-observability`	SHA-256 audit trail, drift detection, agent metrics (p50/p95/p99), health monitoring	6
Connectors	`dw-connectors`	Unified MCP gateway to 15 catalog connectors	56
Orchestration	`dw-orchestration`	Priority scheduler, heartbeat monitor, agent registry, event choreography	internal (not MCP)
MLOps & Models	`dw-ml`	Experiment tracking, model registry, feature pipelines, SHAP explainability, drift detection, A/B testing. Write tools (`train_model`, `deploy_model`, `create_experiment`, `log_metrics`, `register_model`, `create_feature_pipeline`, `ab_test_models`) require Pro.	16

Architecture

┌─────────────────────────────────────────────────────────────────┐
│                         MCP Clients                             │
│  Claude Code · Codex · OpenCode · Gemini · Cursor · Any MCP    │
└────────────────────────────┬────────────────────────────────────┘
                             │  MCP Protocol (JSON-RPC 2.0 / stdio)
                             │
┌────────────────────────────▼────────────────────────────────────┐
│                     11 AI Agents (160+ tools)                   │
│                                                                 │
│  pipelines · incidents · catalog · schema · quality · governance│
│  usage-intelligence · observability · connectors · orchestration│
│  ml                                                             │
└────────────────────────────┬────────────────────────────────────┘
                             │  Factory-injected dependencies
                             │
┌────────────────────────────▼────────────────────────────────────┐
│                   Core Platform (9 packages)                    │
│  MCP Framework · Context Layer · Agent Lifecycle · Validation   │
│  Conflict Resolution · Enterprise · Orchestrator · Platform     │
│  Medallion (Bronze → Silver → Gold lakehouse management)        │
└────────────────────────────┬────────────────────────────────────┘
                             │
┌────────────────────────────▼────────────────────────────────────┐
│              Infrastructure Adapters (auto-detect)              │
│  Redis · Kafka · PostgreSQL · Neo4j · pgvector · PG FTS        │
│  LLM Bridge · Warehouse Bridge · Airflow                       │
│  (falls back to InMemory stubs when services unavailable)       │
└────────────────────────────┬────────────────────────────────────┘
                             │
┌────────────────────────────▼────────────────────────────────────┐
│                  15 Catalog Connectors                          │
│  Snowflake · BigQuery · Databricks · dbt · Iceberg · Glue      │
│  Hive · DataHub · OpenMetadata · Purview · Dataplex · Nessie   │
│  Polaris · OpenLineage · Lake Formation                         │
└─────────────────────────────────────────────────────────────────┘

Connectors

Data Workers includes 15 catalog connectors out of the box. Additional enterprise connectors are available in Pro/Enterprise editions.

Connector	Description
Snowflake	Databases, tables, DDL, usage stats
BigQuery	Datasets, tables, schema, cost estimation
Databricks	Unity Catalog, tables, query history
AWS Glue	Databases, tables, partitions
Lake Formation	Permissions, grants, resource listing
Hive Metastore	Thrift-based database/table/partition access
dbt	Models, lineage, test results, run history
DataHub	Entity search, metadata, lineage, usage stats
OpenMetadata	Entity search, lineage, tags, glossary
Purview	Catalog search, entity metadata, classifications
Dataplex	Lakes, zones, assets, data quality, discovery
Nessie	Git-like branching, commits, merges, content versioning
Apache Iceberg	REST Catalog, time travel, schema evolution, statistics
Apache Polaris	Multi-catalog federation, OAuth2, permission policies
OpenLineage	Lineage graphs, job runs, column lineage, event emission

Category	Connectors
Orchestration (11)	Airflow, Dagster, Prefect, AWS Step Functions, Azure Data Factory, dbt Cloud, Cloud Composer, Temporal, Mage, Kestra, Argo
Alerting (5)	PagerDuty, Slack, Microsoft Teams, OpsGenie, New Relic
Quality (6)	Great Expectations, Soda, Monte Carlo, Anomalo, Bigeye, Elementary
BI (5)	Looker, Tableau, Metabase, Sigma, Superset
Observability (2)	OpenTelemetry, Datadog
Identity (2)	Okta, Azure AD
ITSM (2)	ServiceNow, Jira Service Management
Cost (1)	AWS Cost Explorer
Streaming (1)	Kafka Schema Registry

Community Edition includes up to 3 enterprise connectors. See pricing for details.

Project Structure

dataworkers-claw-community/
├── agents/                    # 11 agent MCP servers
│   ├── dw-pipelines/          # Write tools (generate, deploy) require Pro
│   ├── dw-incidents/
│   ├── dw-context-catalog/
│   ├── dw-schema/
│   ├── dw-quality/
│   ├── dw-governance/
│   ├── dw-usage-intelligence/
│   ├── dw-observability/
│   ├── dw-connectors/
│   ├── dw-orchestration/
│   └── dw-ml/                 # Write tools require Pro
├── core/                      # 9 shared platform packages
│   ├── mcp-framework/         # Base MCP server class
│   ├── infrastructure-stubs/  # 9 interfaces + InMemory stubs + real adapters
│   ├── llm-provider/          # Multi-provider LLM abstraction
│   ├── medallion/             # Bronze/Silver/Gold lakehouse management
│   ├── enterprise/            # Enterprise middleware shim (no-op in Community Edition)
│   ├── orchestrator/          # Multi-agent coordination
│   ├── context-layer/         # Shared context for cross-agent communication
│   └── ...
├── connectors/                # 15 catalog connectors
├── packages/                  # CLI (dw-claw) and VS Code extension
├── tests/                     # Contract, integration, e2e, and eval tests
├── docker/                    # Dockerfiles and compose
└── docs/                      # Architecture specs and guides

Development

npm test          # Run all tests (2,900+, no external services required)
npm run build     # Build all packages
npm run lint      # Lint
npm run typecheck # Type-check
cd agents/dw-pipelines && npm run dev  # Run a single agent in dev mode

Troubleshooting

Agent fails to start: Ensure you're using start-agent.sh (not node directly). The script sets the working directory correctly for tsx module resolution. See docs/MCP-STARTUP-BUG-REPORT.md for details.

Module not found errors: Run npm install from the repo root. The monorepo uses npm workspaces — all dependencies are hoisted.

Tests fail on fresh clone: Make sure Node.js >= 20 is installed. Run npm install before npm test.

Known Limitations

Some advanced features require the cloned repo. The npx dw-claw one-liner works for most workflows. For development or contributing, use the clone-based setup.
dw-orchestration is an internal service, not an MCP agent. It provides task scheduling and agent coordination APIs used by other agents.
Write operations require Pro. Tools like generate_pipeline, deploy_model, and train_model return upgrade prompts in the Community Edition.

Contributing

We welcome contributions. See CONTRIBUTING.md for guidelines on reporting bugs, setting up your dev environment, submitting PRs, and code style.

Join the Data Workers Community on Discord to ask questions and connect with other contributors.

Latest Blog Posts

Your AI Chatbot Just Exposed Your CEO's Salary to an Intern
By Om-Shree-0709 on July 2, 2026.
Agent Identity
MCP Security
OAuth Delegation
Why MCP Servers Need Execution Sandboxing (And Why Your Current Stack Isn't Enough)
By Om-Shree-0709 on June 30, 2026.
Agentic Ai
Prompt Injection
WebAssembly
Lightport: Open-Sourcing Glama's AI Gateway
By punkpeye on April 27, 2026.
OpenAI
open source

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/DataWorkersProject/dataworkers-claw-community'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

Topic	Link
Infrastructure details	docs/ARCHITECTURE.md
Configuration (env vars)	.env.example
Tiers & Pricing	dataworkers.io/pricing
Security	SECURITY.md
License	LICENSE (Apache 2.0)
LLM Data Disclosure	docs/LLM-DATA-DISCLOSURE.md
API Reference	docs/API.md