# Spark MCP Optimizer Walkthrough
I have designed and implemented a **Spark MCP Optimizer** that integrates with your local Spark History Server to analyze jobs and provide tuning recommendations.
## Implemented Components
1. **MCP Server**: A Python-based server exposing tools to fetch metrics and run analysis.
2. **LLM Context Tool** (`get_full_job_context`): Aggregates metrics, anomalies, and source code into a single Markdown prompt for Gemini/LLM consumption.
3. **Analysis Engine**: Heuristic engine (Skew, Spill, Resource) used to generate "signals" for the LLM.
4. **Examples**: PySpark scripts (`ex_skew_join.py`, etc.) to demonstrate specific issues.
## Workflow: LLM-Driven Optimization
Instead of relying solely on hardcoded rules, the MCP server is now designed to feed an LLM:
1. **User (Gemini/Claude)** calls `get_full_job_context(app_id)`.
2. **MCP Server** returns:
- **Anomalies**: "Skew Detected in Stage 2 (Ratio 5.0x)"
- **Code**: The actual PySpark/Scala code.
- **Metrics**: Runtime, GC, Serialization stats.
3. **User (Gemini/Claude)** analyzes the context and generates specific tuning advice.
## Verification Results
### 1. Context Generation Test
I ran `test_llm_context.py` against `ex_partitioning` (App 0007).
**Result**: Generated a clean Markdown summary correctly identifying **Suboptimal Partitioning** and including the source code for the LLM to review.
### 2. Heuristic Analysis Verification
Verified `verify_mcp.py` against 4 example scenarios:
| Script | Analysis Category | Result |
|--------|-------------------|--------|
| `ex_skew_join.py` | Skew & Joins | Verified `SkewAnalysis` flags task imbalance. |
| `ex_spill_memory.py` | Spill & Memory | Verified `SpillAnalysis` flags disk spill. |
| `ex_resource_overhead.py` | GC & Resources | Verified `ResourceAnalysis` flags high GC/Ser. |
| `ex_partitioning.py` | Partitioning | Verified `PartitioningAnalysis` flags small partitions. |
## How to Use
1. Start your Spark Cluster.
2. Run the MCP Server:
```bash
# Ensure mcp is installed: pip install mcp
python3 src/main.py
```
3. **In your LLM Client**:
- Ask: "Analyze application_123 optimization opportunities."
- The LLM calls `get_full_job_context`.
- The LLM reads the metrics + code and suggests changes.