Skip to main content
Glama
yhyyz

Spark EventLog MCP Server

by yhyyz

Spark EventLog MCP Server

中文版本 | English

A comprehensive Spark event log analysis MCP server built on FastMCP 2.0 and FastAPI, providing in-depth performance analysis, resource monitoring, and optimization recommendations.

Features

  • 🌐 FastMCP & FastAPI Integration: MCP protocol support and analysis report APIs powered by FastAPI & FastMCP

  • 📊 Performance Analysis: Shuffle analysis, resource utilization monitoring, task execution analysis

  • 📈 Visual Reports: Auto-generated interactive HTML reports with direct browser access

  • ☁️ Cloud Data Sources: Support for S3 buckets and HTTP URLs with automatic path detection

  • 💡 Intelligent Optimization: Automated optimization recommendations based on analysis results

  • 🔧 Modular Architecture: Clean separation of concerns with specialized modules for tools, middleware, and configuration

  • 📝 Enhanced Logging: Comprehensive request/response logging with detailed debugging information

Quick Start

MCP Client Integration

{ "mcpServers": { "spark-eventlog": { "type": "stdio", "command": "uvx", "args": [ "--from", "git+https://github.com/yhyyz/spark-eventlog-mcp", "spark-eventlog-mcp" ], "env": { "MCP_TRANSPORT": "stdio" } } } }

stdio Mode (Local Development)

{ "mcpServers": { "spark-eventlog": { "command": "uv run python", "args": ["/path/to/spark-eventlog-mcp/start.py"], "env": { "MCP_TRANSPORT": "stdio" } } } }

HTTP Mode

1. Start HTTP Server:

export MCP_TRANSPORT=streamable-http export MCP_HOST=localhost export MCP_PORT=7799 uv run python start.py

2. Configure Remote MCP:

{ "mcpServers": { "spark-eventlog": { "url": "http://localhost:7799/mcp", "type": "http" } } }

3. Access Services:

Analysis Examples

emr-serverless-small-job

emr-eks-big-job-1

emr-eks-big-job-2

emr-eks-big-job-sub-01

emr-eks-big-job-sub-02

Project Structure

spark-eventlog-mcp/ ├── src/spark_eventlog_mcp/ │ ├── server.py # Main FastAPI + MCP integrated server (refactored) │ ├── core/ │ │ └── mature_data_loader.py # Data loader (S3/URL) │ ├── tools/ │ │ ├── mcp_tools.py # MCP tool implementations (NEW) │ │ ├── mature_analyzer.py # Event log analyzer │ │ └── mature_report_generator.py # HTML report generator │ ├── models/ │ │ ├── schemas.py # Pydantic data models │ │ └── mature_models.py # Analysis result models │ └── utils/ │ ├── helpers.py # Utility functions and logging config │ ├── middleware.py # FastAPI request logging middleware (NEW) │ └── uvicorn_config.py # Uvicorn logging configuration (NEW) ├── report_data/ # Generated reports storage ├── start.py # Launch script ├── README.md # This file (English) └── README_zh.md # Chinese version

MCP Tools

Tool Name

Description

Parameters

generate_report

End-to-end report generation - Auto-detects S3/URL, analyzes data, generates HTML reports

path: str (S3 or HTTP URL)

get_analysis_status

Query current analysis session status and metrics

None

clear_session

Clear session cache and reset server state

None

Simplified Tool Usage

The refactored MCP tools focus on simplicity and automation:

{ "jsonrpc": "2.0", "method": "tools/call", "params": { "name": "generate_report", "arguments": { "path": "s3://my-bucket/spark-logs/" } }, "id": 1 }

RESTful API Endpoints

Basic Endpoints

  • GET / - Service information

  • GET /health - Health check

  • GET /docs - API documentation (Swagger UI)

Report Management

  • GET /api/reports - List all reports

  • GET /api/reports/{filename} - View HTML report

  • GET /reports/{filename} - Direct access to report files

  • DELETE /api/reports/{filename} - Delete report

MCP Tool Calls

  • POST /mcp - MCP protocol endpoint

Configuration

Environment Variables

# Server Configuration MCP_TRANSPORT=http # stdio or streamable-http MCP_HOST=0.0.0.0 # HTTP mode listen address MCP_PORT=7799 # HTTP mode port LOG_LEVEL=INFO # Log level # AWS S3 Configuration (Optional) # Not needed if AWS CLI is configured or running on EC2 with appropriate IAM role AWS_ACCESS_KEY_ID=xxx AWS_SECRET_ACCESS_KEY=xxx AWS_DEFAULT_REGION=us-east-1 # Cache Configuration CACHE_ENABLED=true CACHE_TTL=300 # Default Data Source DEFAULT_SOURCE_TYPE=s3 # s3, url, or local

Enhanced Logging Features

The refactored architecture provides comprehensive request/response logging:

FastAPI Request Logging:

2025-12-18 10:30:45 - INFO - Request started - POST /mcp 2025-12-18 10:30:45 - INFO - Client: 192.168.1.100 | User-Agent: Java SDK MCP Client/1.0.0 2025-12-18 10:30:45 - INFO - Content-Type: application/json | Accept: application/json, text/event-stream 2025-12-18 10:30:45 - INFO - Request body: {"jsonrpc":"2.0","method":"tools/call",...} 2025-12-18 10:30:45 - INFO - Request completed - Status: 200 | Duration: 2.156s

Application Logging:

2025-12-18 10:30:45 - INFO - [mcp_tools.py:243:generate_report_tool] - spark-eventlog-mcp - Starting end-to-end report generation

Format: Timestamp - Level - [Filename:Line:Function] - Logger Name - Message

Data Source Support

S3

{ "source_type": "s3", "path": "s3://bucket-name/path/to/eventlogs/" }

HTTP URL

{ "source_type": "url", "path": "https://example.com/eventlog.zip" }

Local File

{ "source_type": "local", "path": "/path/to/local/eventlog.zip" }

Report Features

Generated HTML reports include:

  • 📊 Application Overview (task counts, success rate, duration)

  • 💻 Executor Resource Usage Distribution

  • 🔄 Shuffle Performance Analysis

  • ⚖️ Data Skew Detection

  • 💡 Intelligent Optimization Recommendations

  • 📈 Interactive Visualizations

Troubleshooting

Port Already in Use

# Change port MCP_PORT=9090 python start.py

Missing Dependencies

# Reinstall dependencies uv pip install -e .

AWS Credentials Issues

# Check AWS configuration aws configure list # Or configure in .env AWS_ACCESS_KEY_ID=xxx AWS_SECRET_ACCESS_KEY=xxx

Debug Logging

# Enable DEBUG logs LOG_LEVEL=DEBUG uv run python start.py

Recent Improvements (2025-12-18)

Major Code Refactoring

  • 🎯 Simplified MCP Tools: generate_report now requires only a single string parameter (S3 or URL path)

  • 📦 Modular Architecture: Extracted MCP tool implementations from main server to dedicated modules

  • 📝 Enhanced Logging: Added comprehensive request/response logging with client info, headers, and request body

  • 🔧 Centralized Configuration: Moved uvicorn and middleware configuration to separate utility modules

  • 📉 Reduced Complexity: Main server.py reduced from ~1150 to ~370 lines (70% reduction)

Architecture Changes

  • New Module: tools/mcp_tools.py - Contains all MCP tool implementations

  • New Module: utils/middleware.py - FastAPI request logging middleware

  • New Module: utils/uvicorn_config.py - Centralized uvicorn logging configuration

  • Auto-Detection: Automatic path type detection (S3 vs URL) in generate_report tool

  • Simplified Interface: Single-parameter MCP tools with internal logic handling complexity

HTTP Transport Fixes

  • MCP Protocol Compatibility: Fixed HTTP 406 errors by ensuring proper Accept headers

  • Request Tracing: Added detailed request/response logging for better debugging

  • Error Handling: Improved error messages and status code handling

Tech Stack

  • FastMCP 2.0: MCP protocol support

  • FastAPI: RESTful API framework

  • Pydantic: Data validation and serialization

  • Plotly: Interactive charts

  • boto3: AWS S3 integration

  • aiofiles: Async file operations

Development

# Clone repository git clone <repository-url> cd spark-eventlog-mcp # Install development dependencies uv pip install -e . # MCP Inspector - stdio mode MCP_TRANSPORT="stdio" npx @modelcontextprotocol/inspector uv run python start.py # MCP Inspector - HTTP mode MCP_TRANSPORT="streamable-http" uv run python start.py npx @modelcontextprotocol/inspector --cli http://localhost:7799 --transport http --method tools/list

Support

-
security - not tested
A
license - permissive license
-
quality - not tested

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/yhyyz/spark-eventlog-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server