End-to-End ETL with MCP-Powered AI Agents

This case study demonstrates how Model Context Protocol (MCP) allows AI agents to automate complete ETL workflows, without manual scripting. By exposing data pipelines as structured tools, MCP enables agents to extract, transform, and load data simply by following natural language prompts. This approach reduces integration complexity and helps teams move from code-heavy pipelines to fully orchestrated, agent-driven automation.

Real-World Example: Keboola MCP Server in Action

Keboola’s MCP server turns Keboola pipelines into AI-callable tools. Agents can manage storage, run SQL transformations, trigger jobs, and access metadata—all with natural language. For example, a prompt like “Segment customers with frequent purchases and run that job daily” launches a full ETL workflow with built-in logging and error handling ¹².

# Example: initiating Keboola MCP Agent
from mcp_agent import MCPClient
client = MCPClient.create("url", server_url="https://mcp.eu.keboola.com/sse", auth_token="TOKEN")

This remote connection supports SSE transport and OAuth authentication. The agent can call tools such as create_transformation, run_job, or list_jobs, with Keboola returning structured results in JSON ¹.

Building a Pipeline with AI Prompts

Here is how a natural‑language pipeline prompt might look:

"Create a daily transformation that segments customers who spent over $100 last month. Then save results to a CSV and update the dashboard."

Keboola’s MCP server interprets this, builds the SQL transformation, schedules the job, and monitors execution. Results and logs are returned as MCP responses, making monitoring and error tracking agentically accessible ².

Multi-Platform ETL: Confluent + Keboola

For hybrid workflows, Keboola and Confluent MCP servers work together. Agents can fetch real-time Kafka topics via Confluent, then route cleaned data into Keboola for transformation and loading into a Delta Lake. Calls like list_topics, consume_message, and run_transformation integrate across platforms via standardized MCP interface ³.

# Agent orchestration with multiple MCP endpoints
from semantic_kernel.connectors.mcp import MCPSsePlugin
from semantic_kernel import Kernel

plugin1 = MCPSsePlugin(name="confluent", url="http://conf-mcp.local:9001")
plugin2 = MCPSsePlugin(name="keboola", url="https://mcp.eu.keboola.com/sse")

kernel = Kernel()
kernel.add_plugin(plugin1)
kernel.add_plugin(plugin2)
agent = kernel.create_chat_agent(service_id="openai", model_id="gpt-4")

response = agent.invoke_async("Ingest new Kafka events, transform with Keboola daily, and deliver summary as CSV")
print(response.content)

This shows how a single agent orchestrates real-time ingestion and transformation across MCP-managed platforms ³.

Behind the Scenes

Each tool exposed by the MCP servers is defined with metadata for name, description, input schema, and output format. When an agent calls a tool, the MCP server validates inputs, executes the operation in Keboola or Confluent, and returns structured responses.

Both Keboola and Confluent support async-first architectures, enabling concurrent agent workflows without blocking. Keboola supports HTTP+SSE or CLI transport (with uv), making it compatible with both desktop agents and cloud-based clients ¹⁴. Logs are tracked separately to maintain clean JSON output while providing auditability and observability.

My Thoughts

This ETL automation case shows how MCP can turn natural-language intent into reliable data operations. Agents can create pipelines, schedule jobs, fetch logs, and produce dashboards with clarity and repeatability. For teams working across domains, it removes engineering bottlenecks and lets agents do real data work.

That said, governance and control are essential. Limit write operations to reviewed tools. Validate SQL logic via pre-run checks. Use policy-based controls and log audits, especially in production environments. When implemented carefully, MCP delivers automation, safety, and speed in ETL workflows.

References

Keboola MCP Server: AI-Powered ETL Workflow Automation Overview – Keboola Blog (link)
↩
Keboola MCP Server Turns AI Agents into Data Engineers – SuperbCrew (link)
↩
Powering AI Agents with Real-Time Data using MCP – Confluent Blog (link)
↩
Keboola MCP Server Architecture and Best Practices – Keboola Blog (link)
↩