# Spark MCP Optimizer
A Model Context Protocol (MCP) server that acts as an intelligent Spark tuning assistant. It analyzes Spark application metrics and source code to provide actionable recommendations for performance optimization.
## Features
- **LLM Integration**: New `get_full_job_context` tool aggregates metrics, anomalies, and code into a prompt-ready context for Gemini/Claude.
- **Data Ingestion**: Fetches metrics from Spark History Server (Jobs, Stages, SQL).
- **Hybrid Analysis**: Combines runtime metrics (Skew, Spill) with Static Code Analysis (AST/Regex).
- **Recommendations**: Suggests configuration changes (`spark.sql.*`) and code refactoring (e.g., replace `groupByKey`).
## Architecture
- **`src/main.py`**: MCP Server entrypoint using `fastmcp`.
- **`src/context.py`**: Aggregates Context for LLM consumption.
- **`src/client.py`**: Spark History Server client (Dependency-free).
- **`src/analyzer.py`**: Logic for detecting Skew, Spill, and Resource waste.
- **`src/recommender.py`**: Rule-based engine (Legacy/Fallback).
## Installation
1. Clone this repository.
2. Install dependencies (optional, core modules are standard lib):
```bash
pip install mcp
```
## Usage
### 1. LLM-Driven Optimization (Recommended)
Use the `get_full_job_context` tool to feed your LLM:
```bash
# Example usage in python
python3 test_llm_context.py <app_id> [code_path]
```
This produces a markdown report you can paste into an LLM prompt.
### 2. Standard MCP Server
```bash
# Run server (requires pip install mcp)
python3 src/main.py
```
### Tools Provided
- `get_full_job_context(app_id, code_path)` (**NEW**)
- `get_apps()`
- `get_job_summary(app_id)`
- `analyze_spark_job(app_id, code_content)` (Legacy)
## Documentation
- [Walkthrough](docs/walkthrough.md): detailed implementation and verification guide.
- [Implementation Plan](docs/implementation_plan.md): architectural decisions.
- [Tasks](docs/task.md): Project roadmap.
## Verification
Run the verification script to test against a local Spark History Server:
```bash
python3 verify_mcp.py <app_id> [code_path]
```