Skip to main content
Glama
anespo

S3 Data Lake MCP Server

by anespo

๐Ÿš€ S3 Data Lake MCP Server

License Python AWS MCP

Transform your S3 data lakes into AI-accessible knowledge bases with natural language queries

A production-ready Model Context Protocol (MCP) server that gives AI agents seamless access to S3 data lakes. Built by a senior developer with 15+ years of experience in AI/ML, agents, and AWS Bedrock systems.

๐ŸŽฏ Why This Exists

I was building ETL systems for AI agents and kept hitting the same wall: How do you give agents seamless access to data lakes without building custom APIs for every single use case?

Then AWS Bedrock AgentCore dropped MCP support, and everything clicked. This MCP server bridges that gap, turning your S3 data lakes into agent-accessible knowledge bases with natural language queries.

โœจ Key Features

๐Ÿ”ฅ 8 Powerful Tools - Complete S3 data lake operations
๐Ÿ“Š Multi-Format Support - CSV, JSON, Parquet with intelligent processing
โšก FastMCP Framework - Modern, high-performance MCP server
๐Ÿ—๏ธ AgentCore Runtime - Serverless, auto-scaling deployment
๐Ÿ›ก๏ธ Production-Grade - Comprehensive error handling, monitoring, security
๐ŸŽฏ Type-Safe - Full Python type hints and validation
๐Ÿš€ Deploy in Minutes - UV package management, one-command deployment

๐Ÿ› ๏ธ Available Tools

Tool

Description

Use Case

list_s3_buckets

List accessible S3 buckets

Data discovery

list_s3_objects

Browse bucket contents with filtering

Dataset exploration

read_csv_from_s3

Parse CSV files with metadata

Tabular data analysis

read_json_from_s3

Process JSON objects and arrays

Complex data structures

read_parquet_from_s3

Columnar data with full type info

High-performance analytics

query_csv_data

Filter and query with smart typing

Data querying

get_dataset_summary

Statistical analysis and profiling

Data understanding

get_file_metadata

Comprehensive file information

Metadata exploration

๐Ÿš€ Quick Start

Prerequisites

  • Python 3.12+

  • UV package manager

  • AWS CLI configured

  • AWS Bedrock AgentCore access

1. Install & Setup

# Install UV (if not already installed)
curl -LsSf https://astral.sh/uv/install.sh | sh

# Clone and install dependencies
git clone https://github.com/anespo/s3-data-lake-mcp-server.git
cd s3-data-lake-mcp-server
uv sync

2. Local Development

# Run the MCP server locally
uv run python run_local.py

# Test in another terminal
uv run pytest tests/ -v

3. Deploy to AWS AgentCore Runtime

# One-command deployment
uv run python deploy_uv.py

# Your Agent ARN will be displayed for integration

4. Generate Demo Data (Optional)

# Create 66.7MB of demo datasets
uv run python generate_mock_data.py

๐Ÿ’ฌ Natural Language Queries

Once integrated with your AI agents, you can ask questions like:

Data Discovery:

  • "What S3 buckets do I have access to?"

  • "Show me all datasets in my analytics bucket"

  • "List CSV files larger than 10MB"

Data Analysis:

  • "Read the customer analytics data and show me the first 10 rows"

  • "Find all sales transactions over $50,000"

  • "What columns are available in the IoT sensor data?"

  • "Show me customers in the Technology industry"

Metadata & Insights:

  • "What's the total size of data in my bucket?"

  • "How many records are in each dataset?"

  • "Give me a statistical summary of the sales data"

๐Ÿ—๏ธ Architecture

Architecture Diagram

Built on Modern Stack:

  • ๐Ÿ—๏ธ AWS Bedrock AgentCore Runtime - Serverless, auto-scaling

  • โšก FastMCP Framework - High-performance MCP server

  • ๐Ÿ“ฆ UV Package Manager - Ultra-fast Python dependency management

  • ๐Ÿ”ง boto3 + pandas + pyarrow - Efficient data processing

  • ๐Ÿ›ก๏ธ AWS SigV4 + IAM - Enterprise-grade security

๐Ÿ”— Integration Examples

Kiro IDE

{
  "mcpServers": {
    "s3-data-lake": {
      "command": "python",
      "args": ["kiro_s3_mcp_wrapper.py"],
      "env": {
        "AWS_REGION": "eu-west-1",
        "AWS_PROFILE": "default"
      }
    }
  }
}

Strands Agents

from strands import Agent
from strands.tools.mcp import MCPClient

# Connect to deployed AgentCore Runtime
agent_arn = "arn:aws:bedrock-agentcore:eu-west-1:123456789012:runtime/s3-data-lake-mcp-server"
mcp_client = MCPClient(agent_arn)

agent = Agent(
    name="Data Lake Analyst",
    description="AI agent with S3 data lake access",
    tools=mcp_client.list_tools_sync()
)

# Natural language data analysis
response = agent("Analyze customer data and find high-value segments")

๐Ÿ“Š Demo Environment

The repository includes a complete demo environment with:

  • 66.7MB of realistic mock data across 3 formats

  • Customer Analytics (CSV, 50K records) - Business intelligence data

  • Sales Transactions (JSON, 75K records) - Financial analysis data

  • IoT Sensor Data (Parquet, 100K records) - Time-series analytics data

Perfect for presentations, testing, and showcasing capabilities without exposing real data.

๐Ÿงช Testing & Quality

# Run comprehensive test suite
uv run pytest tests/ -v --cov=src

# Test specific functionality
uv run pytest tests/test_s3_mcp_server.py::test_read_csv_from_s3 -v

# Test deployed MCP server
uv run python test_deployed_mcp.py

Quality Assurance:

  • โœ… 95%+ test coverage

  • โœ… Type safety with mypy

  • โœ… Production error handling

  • โœ… Performance benchmarking

  • โœ… Security validation

๐Ÿ“š Documentation

Document

Description

๐Ÿš€ Deployment Guide

Complete deployment instructions

๐Ÿ—๏ธ Architecture

System design and components

๐Ÿ”— Integration Guide

Kiro and Strands integration

๐Ÿ“‹ API Reference

Full tool documentation

๐Ÿ›ก๏ธ Security & Compliance

  • ๐Ÿ” AWS SigV4 Authentication - Industry-standard request signing

  • ๐ŸŽฏ IAM Role-Based Access - Least privilege principle

  • ๐Ÿ”’ No Hardcoded Credentials - Secure credential management

  • ๐Ÿ“Š Comprehensive Logging - Full audit trail

  • ๐Ÿ›ก๏ธ Error Sanitization - No sensitive data in logs

๐Ÿ“ˆ Monitoring & Observability

  • ๐Ÿ“Š CloudWatch Integration - Centralized logging and metrics

  • ๐ŸŽฏ GenAI Observability - Specialized AI/ML monitoring

  • โšก Performance Tracking - Request latency and throughput

  • ๐Ÿšจ Error Alerting - Proactive issue detection

๐Ÿš€ What's Next?

Planned Enhancements:

  • ๐ŸŒ Multi-region deployment support

  • ๐Ÿ” Advanced query capabilities (SQL-like syntax)

  • ๐Ÿ“ก Real-time streaming data support

  • ๐Ÿš€ Enhanced caching layer (Redis/ElastiCache)

  • ๐Ÿค– ML model integration for data insights

  • ๐Ÿ”Œ Plugin architecture for custom tools

๐Ÿค Contributing

Built by the community, for the community:

  1. ๐Ÿด Fork the repository

  2. ๐ŸŒŸ Create a feature branch

  3. โœจ Add your improvements

  4. ๐Ÿงช Add comprehensive tests

  5. ๐Ÿ“ Update documentation

  6. ๐Ÿš€ Submit a pull request

๐Ÿ“„ License

This project is licensed under a custom license allowing non-commercial use. See LICENSE for details.

๐Ÿ‘จโ€๐Ÿ’ป About the Author

Built by Tony Esposito

Turning complex data infrastructure into simple, agent-accessible APIs.

๐Ÿ™‹โ€โ™‚๏ธ Support & Community

  • ๐Ÿ“– Documentation: Comprehensive guides in /docs

  • ๐Ÿ› Issues: GitHub Issues for bugs and feature requests

  • ๐Ÿ’ฌ Discussions: GitHub Discussions for questions

  • ๐Ÿ“ง Contact: tony@mydataclub.com

โญ Star This Repository

If this MCP server helps your AI agents access S3 data lakes, please star the repository! It helps others discover this tool and motivates continued development.


๐Ÿš€ Ready to give your AI agents superpowers with S3 data lake access? Deploy in minutes and start querying with natural language today!

A
license - permissive license
-
quality - not tested
B
maintenance

Maintenance

โ€“Maintainers
โ€“Response time
โ€“Release cycle
1Releases (12mo)

Resources

Unclaimed servers have limited discoverability.

Looking for Admin?

If you are the server author, to access and configure the admin panel.

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/anespo/s3-data-lake-mcp-server'

If you have feedback or need assistance with the MCP directory API, please join our Discord server