Skip to main content
Glama

Case Study – Automating an ETL Pipeline with MCP

Written by on .

Agentic Ai
Data Science
mcp
ETL
Pipeline

  1. Real-World Example: Keboola MCP Server in Action
    1. Building a Pipeline with AI Prompts
      1. Multi-Platform ETL: Confluent + Keboola
        1. Behind the Scenes
          1. My Thoughts
            1. References

              This case study demonstrates how Model Context Protocol (MCP) allows AI agents to automate complete ETL workflows, without manual scripting. By exposing data pipelines as structured tools, MCP enables agents to extract, transform, and load data simply by following natural language prompts. This approach reduces integration complexity and helps teams move from code-heavy pipelines to fully orchestrated, agent-driven automation.

              Image

              Real-World Example: Keboola MCP Server in Action

              Keboola’s MCP server turns Keboola pipelines into AI-callable tools. Agents can manage storage, run SQL transformations, trigger jobs, and access metadata—all with natural language. For example, a prompt like “Segment customers with frequent purchases and run that job daily” launches a full ETL workflow with built-in logging and error handling 12.

              # Example: initiating Keboola MCP Agent from mcp_agent import MCPClient client = MCPClient.create("url", server_url="https://mcp.eu.keboola.com/sse", auth_token="TOKEN")

              This remote connection supports SSE transport and OAuth authentication. The agent can call tools such as create_transformation, run_job, or list_jobs, with Keboola returning structured results in JSON 1.

              Building a Pipeline with AI Prompts

              Here is how a natural‑language pipeline prompt might look:

              "Create a daily transformation that segments customers who spent over $100 last month. Then save results to a CSV and update the dashboard."

              Keboola’s MCP server interprets this, builds the SQL transformation, schedules the job, and monitors execution. Results and logs are returned as MCP responses, making monitoring and error tracking agentically accessible 2.

              Multi-Platform ETL: Confluent + Keboola

              For hybrid workflows, Keboola and Confluent MCP servers work together. Agents can fetch real-time Kafka topics via Confluent, then route cleaned data into Keboola for transformation and loading into a Delta Lake. Calls like list_topics, consume_message, and run_transformation integrate across platforms via standardized MCP interface 3.

              # Agent orchestration with multiple MCP endpoints from semantic_kernel.connectors.mcp import MCPSsePlugin from semantic_kernel import Kernel plugin1 = MCPSsePlugin(name="confluent", url="http://conf-mcp.local:9001") plugin2 = MCPSsePlugin(name="keboola", url="https://mcp.eu.keboola.com/sse") kernel = Kernel() kernel.add_plugin(plugin1) kernel.add_plugin(plugin2) agent = kernel.create_chat_agent(service_id="openai", model_id="gpt-4") response = agent.invoke_async("Ingest new Kafka events, transform with Keboola daily, and deliver summary as CSV") print(response.content)

              This shows how a single agent orchestrates real-time ingestion and transformation across MCP-managed platforms 3.

              Behind the Scenes

              Image

              Each tool exposed by the MCP servers is defined with metadata for name, description, input schema, and output format. When an agent calls a tool, the MCP server validates inputs, executes the operation in Keboola or Confluent, and returns structured responses.

              Both Keboola and Confluent support async-first architectures, enabling concurrent agent workflows without blocking. Keboola supports HTTP+SSE or CLI transport (with uv), making it compatible with both desktop agents and cloud-based clients 14. Logs are tracked separately to maintain clean JSON output while providing auditability and observability.

              My Thoughts

              This ETL automation case shows how MCP can turn natural-language intent into reliable data operations. Agents can create pipelines, schedule jobs, fetch logs, and produce dashboards with clarity and repeatability. For teams working across domains, it removes engineering bottlenecks and lets agents do real data work.

              That said, governance and control are essential. Limit write operations to reviewed tools. Validate SQL logic via pre-run checks. Use policy-based controls and log audits, especially in production environments. When implemented carefully, MCP delivers automation, safety, and speed in ETL workflows.

              References

              Footnotes

              1. Keboola MCP Server: AI-Powered ETL Workflow Automation Overview – Keboola Blog (link) 2 3

              2. Keboola MCP Server Turns AI Agents into Data Engineers – SuperbCrew (link) 2

              3. Powering AI Agents with Real-Time Data using MCP – Confluent Blog (link) 2

              4. Keboola MCP Server Architecture and Best Practices – Keboola Blog (link)

              Written by Om-Shree-0709 (@Om-Shree-0709)