Apache Spark History Server MCP
Provides optional AWS-specific troubleshooting tools for analyzing Spark workloads on Amazon EMR, including root cause analysis and code recommendations.
Connects to Apache Spark History Server to query and analyze Spark applications, jobs, stages, executors, SQL queries, and more, enabling AI agents to investigate performance, failures, and bottlenecks.
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@Apache Spark History Server MCPShow me failed jobs from my latest Spark app"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
Kubeflow Spark AI Toolkit
Connect AI agents and engineers to Apache Spark History Server for intelligent job analysis, performance monitoring, and investigation
β¨ NEW β Spark History Server CLI is now available
A standalone Go binary that queries Spark History Server directly from your terminal β no MCP, no AI framework, no daemon process. Inspect jobs, compare runs, investigate failures, and script against the Spark REST API.
This project provides two interfaces to your Spark History Server data:
β‘ MCP Server | ||
For | Engineers, shell scripts, CI/CD, coding agents | AI agents and MCP-compatible clients |
Mental model | "I know the command I want to run" | "Agent, investigate this Spark app" |
Install | Single static binary β no dependencies | Python 3.12+, uv |
Get started |
πΊ See it in action:
ποΈ Architecture
graph TB
subgraph Clients
A[π€ AI Agent / LLM]
B[π©βπ» Engineer / Script / CI]
C[π§ Coding Agent - Claude Code / Kiro]
end
subgraph "Kubeflow Spark AI Toolkit"
D[β‘ MCP Server]
E[π οΈ CLI - shs]
end
subgraph "Spark History Servers"
F[π₯ Production]
G[π₯ Staging / Dev]
end
A -->|MCP Protocol| D
B -->|Terminal commands| E
C -->|shs skill file| E
D -->|REST API| F
D -->|REST API| G
E -->|REST API| F
E -->|REST API| GRelated MCP server: OpenTelemetry MCP Server
π οΈ SHS CLI (shs) β For Engineers & Scripts
A standalone Go binary. Query your Spark History Server directly from the terminal, shell scripts, or CI/CD pipelines. Also works as a skill for coding agents like Claude Code and Kiro.
Install
# Auto-detect latest version, OS, and architecture
VERSION=$(curl -s https://api.github.com/repos/kubeflow/mcp-apache-spark-history-server/releases | grep -m1 '"tag_name": "cli/' | cut -d'"' -f4 | sed 's|cli/||')
OS=$(uname -s | tr '[:upper:]' '[:lower:]')
ARCH=$(uname -m)
[ "$ARCH" = "x86_64" ] && ARCH="amd64"
[ "$ARCH" = "aarch64" ] && ARCH="arm64"
curl -sSL "https://github.com/kubeflow/mcp-apache-spark-history-server/releases/download/cli%2F${VERSION}/shs-${VERSION}-${OS}-${ARCH}.tar.gz" | tar xz
sudo mv shs /usr/local/bin/Quick Start
# Generate a config file
shs setup config > config.yaml # then set your Spark History Server URL
# Explore applications
shs apps
shs jobs -a APP_ID --status failed
shs stages -a APP_ID --sort duration
shs compare apps --app-a APP1 --app-b APP2
# Use as a skill with Claude Code or Kiro
shs setup skill > ~/.claude/skills/spark-history.mdCLI documentation for full usage, or check out a real-world example of Claude Code comparing two TPC-DS 3TB benchmark runs.
β‘ MCP Server β For AI Agents
An MCP (Model Context Protocol) server that exposes Spark History Server data as tools for AI agents. Agents query your Spark infrastructure using natural language β the server handles tool selection, multi-server routing, and structured data retrieval.
Use the MCP server when you want an AI agent to conduct multi-step investigations, synthesize findings across tools, or answer natural-language questions about your Spark applications.
Install
# Run directly with uvx (no install needed)
uvx --from mcp-apache-spark-history-server spark-mcp
# Or install with pip
uv tool install mcp-apache-spark-history-server
spark-mcpThe package is published to PyPI.
Configure
Basic configuration below. Create a file named config.yaml:
servers:
local:
default: true
url: "http://your-spark-history-server:18080"
auth: # optional
username: "user"
password: "pass"
include_plan_description: false # include SQL plans by default (default: false)
mcp:
transports:
- streamable-http # or: stdio
port: "18888"
debug: falseConfigurations can be overriden with environment variables.
SHS_MCP_PORT Port for MCP server (default: 18888)
SHS_MCP_TRANSPORT Transport mode: streamable-http or stdio
SHS_MCP_DEBUG Enable debug mode (default: false)
SHS_MCP_ADDRESS Bind address (default: localhost)
SHS_SERVERS_*_URL URL for a specific server
SHS_SERVERS_*_AUTH_USERNAME
SHS_SERVERS_*_AUTH_PASSWORD
SHS_SERVERS_*_AUTH_TOKEN
SHS_SERVERS_*_VERIFY_SSL
SHS_SERVERS_*_TIMEOUT
SHS_SERVERS_*_EMR_CLUSTER_ARN
SHS_SERVERS_*_INCLUDE_PLAN_DESCRIPTIONMulti-Server Setup
Configure multiple Spark History Servers and route queries to specific ones:
servers:
production:
default: true
url: "http://prod-spark-history:18080"
auth:
username: "user"
password: "pass"
staging:
url: "http://staging-spark-history:18080"Agents can target a specific server per query:
"Get application
<app_id>from the production server"
Connect an AI Agent
Agent | Transport | Guide |
Claude Desktop | stdio | |
Claude Code | stdio or streamable-http | |
Kiro | streamable-http | |
LangGraph | streamable-http | |
Strands Agents | streamable-http | |
Local / Inspector | streamable-http |
Available Tools (21)
Application Information
Tool | Description |
| List applications with optional status, date, and limit filters |
| Get application detail: status, resources, duration, attempts |
Job Analysis
Tool | Description |
| List jobs with status filtering |
| Top N slowest jobs |
Stage Analysis
Tool | Description |
| List stages with status filtering |
| Top N slowest stages |
| Stage detail with attempt and summary metrics |
| Task metric distributions (execution time, memory, I/O, spill) |
Executor & Resource Analysis
Tool | Description |
| List executors (active and optionally inactive) |
| Executor detail: resources, task stats, performance |
| Aggregate metrics across all executors |
| Chronological executor add/remove with resource totals |
Configuration & Environment
Tool | Description |
| Spark config, JVM info, system properties, classpath |
SQL & Query Analysis
Tool | Description |
| Top N slowest SQL executions with metrics |
| SQL execution detail with optional plan and node metrics |
| Compare SQL plans and metrics between two jobs |
Performance & Bottleneck Analysis
Tool | Description |
| Identify bottlenecks across stages, tasks, and executors |
Comparative Analysis
Tool | Description |
| Diff Spark configs between two applications |
| Diff performance metrics between two applications |
AWS Spark Troubleshooting (opt-in)
Tool | Description |
| One-shot root cause analysis of failed/slow Spark workloads |
| Code fix recommendations for identified Spark issues |
Automatically available when AWS credentials and region are configured. See IAM setup guide.
Example Agent Queries
"Why is my ETL job running slower than yesterday?" β
get_job_bottlenecks+list_slowest_stages+compare_job_performance"What caused job 42 to fail?" β
list_jobs+get_stage+get_stage_task_summary"Compare today's batch with yesterday's run" β
compare_job_performance+compare_job_environments"Find my slowest SQL queries and explain why" β
list_slowest_sql_queries+get_sql_execution+compare_sql_execution_plans
πΈ Screenshots
π Get Spark Application

β‘ Job Performance Comparison

π Kubernetes Deployment
Deploy the MCP server using Helm:
helm install spark-history-mcp ./deploy/kubernetes/helm/mcp-apache-spark-history-server/
# Production configuration
helm install spark-history-mcp ./deploy/kubernetes/helm/mcp-apache-spark-history-server/ \
--set replicaCount=3 \
--set autoscaling.enabled=trueSee deploy/kubernetes/helm/ for full configuration options.
When deployed in Kubernetes, connect Claude Desktop via mcp-remote:
kubectl port-forward svc/mcp-apache-spark-history-server 18888:18888π AWS Integration
AWS Glue β Connect to Glue Spark History Server
Amazon EMR β Use EMR Persistent UI for Spark analysis
AWS Spark Troubleshooting β One-shot root cause analysis and code fix recommendations for failed Spark workloads (EMR EC2, EMR Serverless). Automatically available when AWS credentials and region are configured. See IAM setup guide for required permissions.
π§ Development Setup
git clone https://github.com/kubeflow/mcp-apache-spark-history-server.git
cd mcp-apache-spark-history-server
# Install Task runner
brew install go-task # macOS; see https://taskfile.dev/installation/ for others
# MCP Server
task install # install Python dependencies
task start-spark-bg # start Spark History Server with sample data
task start-mcp-bg # start MCP server
task start-inspector-bg # open MCP Inspector at http://localhost:6274
task stop-all
# CLI
cd skills/cli
task build # build ./bin/shs
task test # unit tests
task test-e2e # e2e tests (starts/stops Docker SHS automatically)
task start-shs # start SHS with CLI e2e sample dataπ Adopters
Using this project? Add your organization to ADOPTERS.md and help grow the community.
π€ Contributing
See CONTRIBUTING.md for guidelines.
π License
Apache License 2.0 β see LICENSE.
π Trademark Notice
Built for use with Apache Sparkβ’ History Server. Not affiliated with or endorsed by the Apache Software Foundation.
Connect your Spark infrastructure to AI agents and engineers
π οΈ SHS CLI Β· β‘ MCP Server Β· π§ͺ Test Β· π€ Contribute
Built by the community, for the community π
This server cannot be installed
Maintenance
Resources
Unclaimed servers have limited discoverability.
Looking for Admin?
If you are the server author, to access and configure the admin panel.
Latest Blog Posts
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/kubeflow/mcp-apache-spark-history-server'
If you have feedback or need assistance with the MCP directory API, please join our Discord server