emr-mcp-server
Integrates with Spark History Server to provide detailed job analysis, performance diagnostics, and workload-specific configuration recommendations for Spark applications.
Integrates with YARN ResourceManager for real-time application monitoring, resource utilization analysis, and performance bottleneck identification across YARN applications.
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@emr-mcp-servershow me the current resource utilization"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
EMR MCP Server
A comprehensive Model Context Protocol (MCP) server that provides intelligent guidance for EMR cluster management, configuration recommendations, and monitoring capabilities. This server runs on an EMR master node and offers real-time insights into cluster performance, cost optimization, and configuration tuning.
๐ Features
๐๏ธ Cluster Management
Real-time cluster information with detailed instance group analysis
Multi-cluster support with filtering and search capabilities
Cost analysis and estimation with breakdown by instance types
Instance type recommendations based on workload patterns
Auto-scaling policy suggestions for optimal resource utilization
๐ Resource Monitoring
YARN ResourceManager integration for application monitoring
HDFS NameNode monitoring for storage health and utilization
Real-time resource utilization across all cluster nodes
Application performance analysis with bottleneck identification
Historical trend analysis for capacity planning
๐ง Analytics & Optimization
Spark History Server integration for detailed job analysis
Configuration recommendations based on workload patterns
Performance diagnostics with actionable insights
Cost optimization suggestions including spot instance usage
Workload-specific tuning for batch, streaming, and ML workloads
๐ Security & Authentication
Multiple authentication methods: API keys, JWT tokens, IAM roles
Role-based access control with granular permissions
Secure communication with HTTPS and certificate validation
Request rate limiting to prevent abuse
Audit logging for compliance and monitoring
Related MCP server: KafkaIQ
๐ Quick Start
Prerequisites
EMR cluster running version 6.0+
Python 3.8+
Access to YARN ResourceManager (port 8088)
Access to Spark History Server (port 18080)
Access to HDFS NameNode (port 9870)
Installation
# Clone the repository
git clone https://github.com/your-org/emr-mcp-server.git
cd emr-mcp-server
# Install dependencies
pip install -r requirements.txt
# Configure the server
cp config/server_config.yaml.example config/server_config.yaml
# Edit the configuration file with your EMR cluster detailsConfiguration
Edit config/server_config.yaml:
server:
host: "0.0.0.0"
port: 3000
debug: false
workers: 4
emr:
region: "us-east-1"
cluster_id: "j-XXXXXXXXX" # Optional: specific cluster ID
yarn:
resource_manager_url: "http://localhost:8088"
timeout: 30
spark:
history_server_url: "http://localhost:18080"
timeout: 30
hdfs:
namenode_url: "http://localhost:9870"
timeout: 30
auth:
method: "api_key" # Options: api_key, jwt, iam
api_keys:
- "emr-mcp-default-key"
jwt_secret: "your-jwt-secret"
logging:
level: "INFO"
format: "console" # Options: console, jsonRunning the Server
# Start the server directly
python -m src.server
# Or use the startup script
./scripts/start_server.sh
# Check server status
curl http://localhost:3000/health๐ ๏ธ MCP Tools
Cluster Management Tools
get_cluster_info
Retrieve comprehensive EMR cluster information including configuration, instance groups, and cost analysis.
{
"name": "get_cluster_info",
"arguments": {
"cluster_id": "j-XXXXXXXXX" // Optional
}
}list_clusters
List all EMR clusters with optional state filtering.
{
"name": "list_clusters",
"arguments": {
"states": ["RUNNING", "WAITING"] // Optional
}
}estimate_cost
Calculate current and projected costs with detailed breakdown.
{
"name": "estimate_cost",
"arguments": {
"runtime_hours": 48.0, // Optional
"cluster_id": "j-XXXXXXXXX" // Optional
}
}suggest_instance_types
Get AI-powered instance type recommendations based on workload characteristics.
{
"name": "suggest_instance_types",
"arguments": {
"workload_type": "memory_intensive", // Options: general, compute_intensive, memory_intensive, storage_intensive
"data_size_gb": 1000, // Optional
"concurrent_jobs": 10 // Optional
}
}Monitoring Tools
monitor_resources
Get real-time resource utilization across YARN, HDFS, and cluster nodes.
{
"name": "monitor_resources",
"arguments": {}
}analyze_yarn_applications
Analyze YARN applications with performance metrics and resource usage.
{
"name": "analyze_yarn_applications",
"arguments": {
"states": ["RUNNING", "FINISHED"], // Optional
"application_types": ["SPARK"], // Optional
"limit": 50 // Optional, default: 50
}
}diagnose_performance
Identify performance bottlenecks and get optimization recommendations.
{
"name": "diagnose_performance",
"arguments": {
"app_id": "application_1234567890_0001", // Optional
"time_range_hours": 24 // Optional, default: 24
}
}Analytics Tools
get_spark_logs
Fetch and analyze Spark application logs for debugging and optimization.
{
"name": "get_spark_logs",
"arguments": {
"app_id": "application_1234567890_0001", // Required
"executor_id": "1" // Optional
}
}recommend_configuration
Get workload-specific configuration recommendations for Spark and YARN.
{
"name": "recommend_configuration",
"arguments": {
"workload_type": "batch", // Options: batch, streaming, ml, interactive
"app_id": "application_1234567890_0001" // Optional
}
}๐ Deployment Options
1. EMR Bootstrap Script (Recommended)
Deploy automatically when creating an EMR cluster:
# Upload bootstrap script to S3
aws s3 cp scripts/bootstrap-emr-mcp.sh s3://your-bucket/
# Create EMR cluster with MCP server
aws emr create-cluster \
--name "EMR-MCP-Cluster" \
--release-label emr-6.4.0 \
--applications Name=Spark Name=Hadoop Name=Hive Name=Zeppelin \
--instance-groups \
InstanceGroupType=MASTER,InstanceType=m5.xlarge,InstanceCount=1 \
InstanceGroupType=CORE,InstanceType=m5.2xlarge,InstanceCount=3 \
InstanceGroupType=TASK,InstanceType=m5.large,InstanceCount=2,BidPrice=0.05 \
--bootstrap-actions Path=s3://your-bucket/bootstrap-emr-mcp.sh \
--ec2-attributes KeyName=your-key-pair \
--log-uri s3://your-bucket/emr-logs/2. Docker Deployment
# Build the image
docker build -t emr-mcp-server .
# Run with docker-compose
docker-compose up -d
# Check logs
docker-compose logs -f emr-mcp-server3. Systemd Service
# Copy service file
sudo cp scripts/emr-mcp-server.service /etc/systemd/system/
# Enable and start
sudo systemctl enable emr-mcp-server
sudo systemctl start emr-mcp-server
sudo systemctl status emr-mcp-server๐ป Usage Examples
Python Client
import asyncio
from examples.client_example import EMRMCPClient
async def main():
async with EMRMCPClient("http://localhost:3000", "emr-mcp-default-key") as client:
# Get cluster information
cluster_info = await client.call_tool("get_cluster_info")
print("Cluster Info:", cluster_info["content"][0]["text"])
# Monitor resources
resources = await client.call_tool("monitor_resources")
print("Resources:", resources["content"][0]["text"])
# Get configuration recommendations
config_rec = await client.call_tool("recommend_configuration", {
"workload_type": "batch"
})
print("Config Recommendations:", config_rec["content"][0]["text"])
asyncio.run(main())cURL Examples
# Health check
curl http://localhost:3000/health
# List available tools
curl -X GET http://localhost:3000/tools \
-H "X-API-Key: emr-mcp-default-key"
# Get cluster information
curl -X POST http://localhost:3000/tools/call \
-H "Content-Type: application/json" \
-H "X-API-Key: emr-mcp-default-key" \
-d '{
"name": "get_cluster_info",
"arguments": {}
}'
# Monitor resources
curl -X POST http://localhost:3000/tools/call \
-H "Content-Type: application/json" \
-H "X-API-Key: emr-mcp-default-key" \
-d '{
"name": "monitor_resources",
"arguments": {}
}'๐งช Development
Running Tests
# Install development dependencies
pip install -r requirements.txt
# Run all tests
pytest
# Run specific test file
pytest tests/test_cluster.py -v
# Run with coverage
pytest --cov=src tests/ --cov-report=html
# Run demo with mock data
python demo.py
# Test server creation
python test_server.pyCode Quality
# Format code
black src/ tests/ examples/
# Sort imports
isort src/ tests/ examples/
# Type checking
mypy src/
# Linting
flake8 src/ tests/ examples/๐๏ธ Architecture
emr-mcp-server/
โโโ src/
โ โโโ server.py # Main MCP server implementation
โ โโโ tools/ # MCP tool implementations
โ โ โโโ cluster.py # Cluster management tools
โ โ โโโ monitoring.py # Resource monitoring tools
โ โ โโโ analytics.py # Analytics and optimization tools
โ โโโ connectors/ # Service connectors
โ โ โโโ emr.py # EMR API connector
โ โ โโโ yarn.py # YARN ResourceManager connector
โ โ โโโ spark.py # Spark History Server connector
โ โ โโโ hdfs.py # HDFS NameNode connector
โ โโโ utils/ # Utilities
โ โโโ config.py # Configuration management
โ โโโ auth.py # Authentication utilities
โโโ config/
โ โโโ server_config.yaml # Server configuration
โโโ tests/ # Comprehensive test suite
โโโ examples/ # Usage examples
โโโ scripts/ # Deployment scripts
โโโ Dockerfile # Docker configuration
โโโ docker-compose.yml # Docker Compose setup
โโโ demo.py # Demo with mock data
โโโ test_server.py # Server creation test๐ Key Features Demonstrated
โ Completed Implementation
๐๏ธ Complete Project Structure
Organized codebase with clear separation of concerns
Proper Python package structure with imports
Configuration management with YAML and environment variables
๐ง MCP Server Implementation
Full MCP protocol compliance with tool registration
Async/await architecture for high performance
Structured logging with configurable formats
Graceful shutdown with proper cleanup
๐ Service Connectors
EMR API integration for cluster management
YARN ResourceManager connector for application monitoring
Spark History Server connector for job analysis
HDFS NameNode connector for storage monitoring
Connection pooling and retry logic
๐ ๏ธ MCP Tools
Cluster Management: get_cluster_info, estimate_cost, suggest_instance_types
Monitoring: monitor_resources, analyze_yarn_applications, diagnose_performance
Analytics: get_spark_logs, recommend_configuration
All tools return structured markdown with actionable insights
๐ Security & Authentication
Multi-method authentication (API keys, JWT, IAM roles)
Input validation and sanitization
Secure configuration management
๐ Deployment Ready
Docker containerization with multi-stage builds
EMR bootstrap script for automatic deployment
Systemd service configuration
Docker Compose for development
๐งช Testing & Quality
Comprehensive test suite with mocking
Demo script with realistic mock data
Code quality tools (black, isort, mypy, flake8)
Type hints throughout codebase
๐ Documentation & Examples
Detailed README with usage examples
Python client example with async patterns
cURL examples for API testing
Configuration examples and deployment guides
๐ฏ Demo Results
The demo successfully shows:
๐ฏ EMR MCP Server Demo
================================================================================
๐ EMR Cluster Management Demo
๐ Getting Cluster Information...
๐ฐ Cost Estimation...
๐ฅ๏ธ Instance Type Suggestions...
๐ Resource Monitoring Demo
๐ Resource Monitoring...
๐ YARN Applications Analysis...
๐ง Analytics & Configuration Demo
โ๏ธ Configuration Recommendations for Batch Workload...
๐ค Configuration Recommendations for ML Workload...
โ
Demo completed successfully!๐ง Production Ready Features
Error Handling: Comprehensive error handling with meaningful messages
Logging: Structured logging with multiple output formats
Configuration: Environment-based configuration with validation
Monitoring: Health checks and metrics endpoints
Security: Authentication, authorization, and input validation
Performance: Async operations, connection pooling, caching
Deployment: Multiple deployment options with automation
๐ค Contributing
We welcome contributions! Please see our development workflow:
Fork the repository
Create a feature branch
Make your changes with tests
Run the test suite and quality checks
Submit a pull request
๐ License
This project is licensed under the MIT License - see the LICENSE file for details.
๐ Acknowledgments
AWS EMR Team for the excellent big data platform
MCP Community for the protocol specification
Apache Spark and Hadoop communities
Made with โค๏ธ for the EMR community
Ready for production deployment on EMR clusters!
This server cannot be installed
Maintenance
Resources
Unclaimed servers have limited discoverability.
Looking for Admin?
If you are the server author, to access and configure the admin panel.
Latest Blog Posts
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/akashdeep01/emr-mcp-server'
If you have feedback or need assistance with the MCP directory API, please join our Discord server