How do I use Spark EventLog MCP Server?

1. Click on "Install Server". 2. Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state. 3. In the chat, type @ followed by the MCP server name and your instructions, e.g., "@Spark EventLog MCP Server analyze spark logs from s3://my-bucket/jobs/ and generate a performance report" That's it! The server will respond to your query, and you can continue using it as needed. Here is a step-by-step guide with screenshots.

Spark EventLog MCP Server

中文版本 | English

A comprehensive Spark event log analysis MCP server built on FastMCP 2.0 and FastAPI, providing in-depth performance analysis, resource monitoring, and optimization recommendations.

Features

🌐 FastMCP & FastAPI Integration: MCP protocol support and analysis report APIs powered by FastAPI & FastMCP
📊 Performance Analysis: Shuffle analysis, resource utilization monitoring, task execution analysis
📈 Visual Reports: Auto-generated interactive HTML reports with direct browser access
☁️ Cloud Data Sources: Support for S3 buckets and HTTP URLs with automatic path detection
💡 Intelligent Optimization: Automated optimization recommendations based on analysis results
🔧 Modular Architecture: Clean separation of concerns with specialized modules for tools, middleware, and configuration
📝 Enhanced Logging: Comprehensive request/response logging with detailed debugging information

Quick Start

MCP Client Integration

uvx Mode (Recommended - Direct from GitHub)

{
  "mcpServers": {
    "spark-eventlog": {
      "type": "stdio",
      "command": "uvx",
      "args": [
        "--from",
        "git+https://github.com/yhyyz/spark-eventlog-mcp",
        "spark-eventlog-mcp"
      ],
      "env": {
        "MCP_TRANSPORT": "stdio"
      }
    }
  }
}

stdio Mode (Local Development)

{
  "mcpServers": {
    "spark-eventlog": {
      "command": "uv run python",
      "args": ["/path/to/spark-eventlog-mcp/start.py"],
      "env": {
        "MCP_TRANSPORT": "stdio"
      }
    }
  }
}

HTTP Mode

1. Start HTTP Server:

export MCP_TRANSPORT=streamable-http
export MCP_HOST=localhost
export MCP_PORT=7799

uv run python start.py

2. Configure Remote MCP:

{
  "mcpServers": {
    "spark-eventlog": {
      "url": "http://localhost:7799/mcp",
      "type": "http"
    }
  }
}

3. Access Services:

API Documentation: http://localhost:7799/docs
Health Check: http://localhost:7799/health
Reports List: http://localhost:7799/api/reports
MCP Endpoint: http://localhost:7799/mcp

Analysis Examples

emr-serverless-small-job

emr-eks-big-job-1

emr-eks-big-job-2

emr-eks-big-job-sub-01

emr-eks-big-job-sub-02

Project Structure

spark-eventlog-mcp/
├── src/spark_eventlog_mcp/
│   ├── server.py              # Main FastAPI + MCP integrated server (refactored)
│   ├── core/
│   │   └── mature_data_loader.py    # Data loader (S3/URL)
│   ├── tools/
│   │   ├── mcp_tools.py      # MCP tool implementations (NEW)
│   │   ├── mature_analyzer.py       # Event log analyzer
│   │   └── mature_report_generator.py  # HTML report generator
│   ├── models/
│   │   ├── schemas.py        # Pydantic data models
│   │   └── mature_models.py  # Analysis result models
│   └── utils/
│       ├── helpers.py         # Utility functions and logging config
│       ├── middleware.py      # FastAPI request logging middleware (NEW)
│       └── uvicorn_config.py  # Uvicorn logging configuration (NEW)
├── report_data/               # Generated reports storage
├── start.py                   # Launch script
├── README.md                 # This file (English)
└── README_zh.md              # Chinese version

MCP Tools

Tool Name	Description	Parameters
`generate_report`	End-to-end report generation - Auto-detects S3/URL, analyzes data, generates HTML reports	`path: str` (S3 or HTTP URL)
`get_analysis_status`	Query current analysis session status and metrics	None
`clear_session`	Clear session cache and reset server state	None

Simplified Tool Usage

The refactored MCP tools focus on simplicity and automation:

{
  "jsonrpc": "2.0",
  "method": "tools/call",
  "params": {
    "name": "generate_report",
    "arguments": {
      "path": "s3://my-bucket/spark-logs/"
    }
  },
  "id": 1
}

RESTful API Endpoints

Basic Endpoints

GET / - Service information
GET /health - Health check
GET /docs - API documentation (Swagger UI)

Report Management

GET /api/reports - List all reports
GET /api/reports/{filename} - View HTML report
GET /reports/{filename} - Direct access to report files
DELETE /api/reports/{filename} - Delete report

MCP Tool Calls

POST /mcp - MCP protocol endpoint

Configuration

Environment Variables

# Server Configuration
MCP_TRANSPORT=http          # stdio or streamable-http
MCP_HOST=0.0.0.0           # HTTP mode listen address
MCP_PORT=7799              # HTTP mode port
LOG_LEVEL=INFO             # Log level

# AWS S3 Configuration (Optional)
# Not needed if AWS CLI is configured or running on EC2 with appropriate IAM role
AWS_ACCESS_KEY_ID=xxx
AWS_SECRET_ACCESS_KEY=xxx
AWS_DEFAULT_REGION=us-east-1

# Cache Configuration
CACHE_ENABLED=true
CACHE_TTL=300

# Default Data Source
DEFAULT_SOURCE_TYPE=s3  # s3, url, or local

Enhanced Logging Features

The refactored architecture provides comprehensive request/response logging:

FastAPI Request Logging:

2025-12-18 10:30:45 - INFO - Request started - POST /mcp
2025-12-18 10:30:45 - INFO - Client: 192.168.1.100 | User-Agent: Java SDK MCP Client/1.0.0
2025-12-18 10:30:45 - INFO - Content-Type: application/json | Accept: application/json, text/event-stream
2025-12-18 10:30:45 - INFO - Request body: {"jsonrpc":"2.0","method":"tools/call",...}
2025-12-18 10:30:45 - INFO - Request completed - Status: 200 | Duration: 2.156s

Application Logging:

2025-12-18 10:30:45 - INFO - [mcp_tools.py:243:generate_report_tool] - spark-eventlog-mcp - Starting end-to-end report generation

Format: Timestamp - Level - [Filename:Line:Function] - Logger Name - Message

Data Source Support

S3

{
    "source_type": "s3",
    "path": "s3://bucket-name/path/to/eventlogs/"
}

HTTP URL

{
    "source_type": "url",
    "path": "https://example.com/eventlog.zip"
}

Local File

{
    "source_type": "local",
    "path": "/path/to/local/eventlog.zip"
}

Report Features

Generated HTML reports include:

📊 Application Overview (task counts, success rate, duration)
💻 Executor Resource Usage Distribution
🔄 Shuffle Performance Analysis
⚖️ Data Skew Detection
💡 Intelligent Optimization Recommendations
📈 Interactive Visualizations

Troubleshooting

Port Already in Use

# Change port
MCP_PORT=9090 python start.py

Missing Dependencies

# Reinstall dependencies
uv pip install -e .

AWS Credentials Issues

# Check AWS configuration
aws configure list

# Or configure in .env
AWS_ACCESS_KEY_ID=xxx
AWS_SECRET_ACCESS_KEY=xxx

Debug Logging

# Enable DEBUG logs
LOG_LEVEL=DEBUG uv run python start.py

Recent Improvements (2025-12-18)

Major Code Refactoring

🎯 Simplified MCP Tools: generate_report now requires only a single string parameter (S3 or URL path)
📦 Modular Architecture: Extracted MCP tool implementations from main server to dedicated modules
📝 Enhanced Logging: Added comprehensive request/response logging with client info, headers, and request body
🔧 Centralized Configuration: Moved uvicorn and middleware configuration to separate utility modules
📉 Reduced Complexity: Main server.py reduced from ~1150 to ~370 lines (70% reduction)

Architecture Changes

New Module: tools/mcp_tools.py - Contains all MCP tool implementations
New Module: utils/middleware.py - FastAPI request logging middleware
New Module: utils/uvicorn_config.py - Centralized uvicorn logging configuration
Auto-Detection: Automatic path type detection (S3 vs URL) in generate_report tool
Simplified Interface: Single-parameter MCP tools with internal logic handling complexity

HTTP Transport Fixes

MCP Protocol Compatibility: Fixed HTTP 406 errors by ensuring proper Accept headers
Request Tracing: Added detailed request/response logging for better debugging
Error Handling: Improved error messages and status code handling

Tech Stack

FastMCP 2.0: MCP protocol support
FastAPI: RESTful API framework
Pydantic: Data validation and serialization
Plotly: Interactive charts
boto3: AWS S3 integration
aiofiles: Async file operations

Development

# Clone repository
git clone <repository-url>
cd spark-eventlog-mcp

# Install development dependencies
uv pip install -e .

# MCP Inspector - stdio mode
MCP_TRANSPORT="stdio" npx @modelcontextprotocol/inspector uv run python start.py

# MCP Inspector - HTTP mode
MCP_TRANSPORT="streamable-http" uv run python start.py
npx @modelcontextprotocol/inspector --cli http://localhost:7799 --transport http --method tools/list

Support

Documentation: Check /docs API documentation
Issues: Submit GitHub Issues
Reference: FastMCP Documentation

Spark EventLog MCP Server

Spark EventLog MCP Server

Features

Quick Start

MCP Client Integration

uvx Mode (Recommended - Direct from GitHub)

stdio Mode (Local Development)

HTTP Mode

Analysis Examples

Project Structure

MCP Tools

Simplified Tool Usage

RESTful API Endpoints

Basic Endpoints

Report Management

MCP Tool Calls

Configuration

Environment Variables

Enhanced Logging Features

Data Source Support

S3

HTTP URL

Local File

Report Features

Troubleshooting

Port Already in Use

Missing Dependencies

AWS Credentials Issues

Debug Logging

Recent Improvements (2025-12-18)

Major Code Refactoring

Architecture Changes

HTTP Transport Fixes

Tech Stack

Development

Support

Resources

Looking for Admin?

Latest Blog Posts

MCP directory API