Spark History MCP Server

walkthrough.md•2.27 KiB

# Spark MCP Optimizer Walkthrough I have designed and implemented a **Spark MCP Optimizer** that integrates with your local Spark History Server to analyze jobs and provide tuning recommendations. ## Implemented Components 1. **MCP Server**: A Python-based server exposing tools to fetch metrics and run analysis. 2. **LLM Context Tool** (`get_full_job_context`): Aggregates metrics, anomalies, and source code into a single Markdown prompt for Gemini/LLM consumption. 3. **Analysis Engine**: Heuristic engine (Skew, Spill, Resource) used to generate "signals" for the LLM. 4. **Examples**: PySpark scripts (`ex_skew_join.py`, etc.) to demonstrate specific issues. ## Workflow: LLM-Driven Optimization Instead of relying solely on hardcoded rules, the MCP server is now designed to feed an LLM: 1. **User (Gemini/Claude)** calls `get_full_job_context(app_id)`. 2. **MCP Server** returns: - **Anomalies**: "Skew Detected in Stage 2 (Ratio 5.0x)" - **Code**: The actual PySpark/Scala code. - **Metrics**: Runtime, GC, Serialization stats. 3. **User (Gemini/Claude)** analyzes the context and generates specific tuning advice. ## Verification Results ### 1. Context Generation Test I ran `test_llm_context.py` against `ex_partitioning` (App 0007). **Result**: Generated a clean Markdown summary correctly identifying **Suboptimal Partitioning** and including the source code for the LLM to review. ### 2. Heuristic Analysis Verification Verified `verify_mcp.py` against 4 example scenarios: | Script | Analysis Category | Result | |--------|-------------------|--------| | `ex_skew_join.py` | Skew & Joins | Verified `SkewAnalysis` flags task imbalance. | | `ex_spill_memory.py` | Spill & Memory | Verified `SpillAnalysis` flags disk spill. | | `ex_resource_overhead.py` | GC & Resources | Verified `ResourceAnalysis` flags high GC/Ser. | | `ex_partitioning.py` | Partitioning | Verified `PartitioningAnalysis` flags small partitions. | ## How to Use 1. Start your Spark Cluster. 2. Run the MCP Server: ```bash # Ensure mcp is installed: pip install mcp python3 src/main.py ``` 3. **In your LLM Client**: - Ask: "Analyze application_123 optimization opportunities." - The LLM calls `get_full_job_context`. - The LLM reads the metrics + code and suggests changes.

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/ravipesala/spark_mcp_optimizer'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

walkthrough.md•2.27 KiB