S3 Data Lake MCP Server
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@S3 Data Lake MCP Serverlist all S3 buckets and show me the files in the analytics bucket"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
๐ S3 Data Lake MCP Server
Transform your S3 data lakes into AI-accessible knowledge bases with natural language queries
A production-ready Model Context Protocol (MCP) server that gives AI agents seamless access to S3 data lakes. Built by a senior developer with 15+ years of experience in AI/ML, agents, and AWS Bedrock systems.
๐ฏ Why This Exists
I was building ETL systems for AI agents and kept hitting the same wall: How do you give agents seamless access to data lakes without building custom APIs for every single use case?
Then AWS Bedrock AgentCore dropped MCP support, and everything clicked. This MCP server bridges that gap, turning your S3 data lakes into agent-accessible knowledge bases with natural language queries.
โจ Key Features
๐ฅ 8 Powerful Tools - Complete S3 data lake operations
๐ Multi-Format Support - CSV, JSON, Parquet with intelligent processing
โก FastMCP Framework - Modern, high-performance MCP server
๐๏ธ AgentCore Runtime - Serverless, auto-scaling deployment
๐ก๏ธ Production-Grade - Comprehensive error handling, monitoring, security
๐ฏ Type-Safe - Full Python type hints and validation
๐ Deploy in Minutes - UV package management, one-command deployment
๐ ๏ธ Available Tools
Tool | Description | Use Case |
| List accessible S3 buckets | Data discovery |
| Browse bucket contents with filtering | Dataset exploration |
| Parse CSV files with metadata | Tabular data analysis |
| Process JSON objects and arrays | Complex data structures |
| Columnar data with full type info | High-performance analytics |
| Filter and query with smart typing | Data querying |
| Statistical analysis and profiling | Data understanding |
| Comprehensive file information | Metadata exploration |
๐ Quick Start
Prerequisites
Python 3.12+
UV package manager
AWS CLI configured
AWS Bedrock AgentCore access
1. Install & Setup
# Install UV (if not already installed)
curl -LsSf https://astral.sh/uv/install.sh | sh
# Clone and install dependencies
git clone https://github.com/anespo/s3-data-lake-mcp-server.git
cd s3-data-lake-mcp-server
uv sync2. Local Development
# Run the MCP server locally
uv run python run_local.py
# Test in another terminal
uv run pytest tests/ -v3. Deploy to AWS AgentCore Runtime
# One-command deployment
uv run python deploy_uv.py
# Your Agent ARN will be displayed for integration4. Generate Demo Data (Optional)
# Create 66.7MB of demo datasets
uv run python generate_mock_data.py๐ฌ Natural Language Queries
Once integrated with your AI agents, you can ask questions like:
Data Discovery:
"What S3 buckets do I have access to?"
"Show me all datasets in my analytics bucket"
"List CSV files larger than 10MB"
Data Analysis:
"Read the customer analytics data and show me the first 10 rows"
"Find all sales transactions over $50,000"
"What columns are available in the IoT sensor data?"
"Show me customers in the Technology industry"
Metadata & Insights:
"What's the total size of data in my bucket?"
"How many records are in each dataset?"
"Give me a statistical summary of the sales data"
๐๏ธ Architecture

Built on Modern Stack:
๐๏ธ AWS Bedrock AgentCore Runtime - Serverless, auto-scaling
โก FastMCP Framework - High-performance MCP server
๐ฆ UV Package Manager - Ultra-fast Python dependency management
๐ง boto3 + pandas + pyarrow - Efficient data processing
๐ก๏ธ AWS SigV4 + IAM - Enterprise-grade security
๐ Integration Examples
Kiro IDE
{
"mcpServers": {
"s3-data-lake": {
"command": "python",
"args": ["kiro_s3_mcp_wrapper.py"],
"env": {
"AWS_REGION": "eu-west-1",
"AWS_PROFILE": "default"
}
}
}
}Strands Agents
from strands import Agent
from strands.tools.mcp import MCPClient
# Connect to deployed AgentCore Runtime
agent_arn = "arn:aws:bedrock-agentcore:eu-west-1:123456789012:runtime/s3-data-lake-mcp-server"
mcp_client = MCPClient(agent_arn)
agent = Agent(
name="Data Lake Analyst",
description="AI agent with S3 data lake access",
tools=mcp_client.list_tools_sync()
)
# Natural language data analysis
response = agent("Analyze customer data and find high-value segments")๐ Demo Environment
The repository includes a complete demo environment with:
66.7MB of realistic mock data across 3 formats
Customer Analytics (CSV, 50K records) - Business intelligence data
Sales Transactions (JSON, 75K records) - Financial analysis data
IoT Sensor Data (Parquet, 100K records) - Time-series analytics data
Perfect for presentations, testing, and showcasing capabilities without exposing real data.
๐งช Testing & Quality
# Run comprehensive test suite
uv run pytest tests/ -v --cov=src
# Test specific functionality
uv run pytest tests/test_s3_mcp_server.py::test_read_csv_from_s3 -v
# Test deployed MCP server
uv run python test_deployed_mcp.pyQuality Assurance:
โ 95%+ test coverage
โ Type safety with mypy
โ Production error handling
โ Performance benchmarking
โ Security validation
๐ Documentation
Document | Description |
Complete deployment instructions | |
System design and components | |
Kiro and Strands integration | |
Full tool documentation |
๐ก๏ธ Security & Compliance
๐ AWS SigV4 Authentication - Industry-standard request signing
๐ฏ IAM Role-Based Access - Least privilege principle
๐ No Hardcoded Credentials - Secure credential management
๐ Comprehensive Logging - Full audit trail
๐ก๏ธ Error Sanitization - No sensitive data in logs
๐ Monitoring & Observability
๐ CloudWatch Integration - Centralized logging and metrics
๐ฏ GenAI Observability - Specialized AI/ML monitoring
โก Performance Tracking - Request latency and throughput
๐จ Error Alerting - Proactive issue detection
๐ What's Next?
Planned Enhancements:
๐ Multi-region deployment support
๐ Advanced query capabilities (SQL-like syntax)
๐ก Real-time streaming data support
๐ Enhanced caching layer (Redis/ElastiCache)
๐ค ML model integration for data insights
๐ Plugin architecture for custom tools
๐ค Contributing
Built by the community, for the community:
๐ด Fork the repository
๐ Create a feature branch
โจ Add your improvements
๐งช Add comprehensive tests
๐ Update documentation
๐ Submit a pull request
๐ License
This project is licensed under a custom license allowing non-commercial use. See LICENSE for details.
๐จโ๐ป About the Author
Built by Tony Esposito
Turning complex data infrastructure into simple, agent-accessible APIs.
๐โโ๏ธ Support & Community
๐ Documentation: Comprehensive guides in
/docs๐ Issues: GitHub Issues for bugs and feature requests
๐ฌ Discussions: GitHub Discussions for questions
๐ง Contact: tony@mydataclub.com
โญ Star This Repository
If this MCP server helps your AI agents access S3 data lakes, please star the repository! It helps others discover this tool and motivates continued development.
๐ Ready to give your AI agents superpowers with S3 data lake access? Deploy in minutes and start querying with natural language today!
This server cannot be installed
Maintenance
Resources
Unclaimed servers have limited discoverability.
Looking for Admin?
If you are the server author, to access and configure the admin panel.
Latest Blog Posts
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/anespo/s3-data-lake-mcp-server'
If you have feedback or need assistance with the MCP directory API, please join our Discord server