Spark History MCP Server

QUICK_REFERENCE.md•2.38 KiB

# Quick Reference: System Flow ## 🎯 One-Minute Overview ``` User → Engine → [Spark Client + LLM Agents] → Report ``` ## 📊 7 Phases 1. **User Input** - CLI or MCP Server 2. **Context Gathering** - Fetch metrics from Spark History Server 3. **Agent Orchestration** - Run 6 specialized agents 4. **Agent Analysis** - Each agent analyzes specific aspect 5. **LLM Interaction** - Agents call Gemini API for insights 6. **Report Building** - Consolidate all agent outputs 7. **Output** - Save JSON report ## 🤖 6 Specialized Agents | Agent | Detects | Output | |-------|---------|--------| | **Execution** | Long stages, imbalance, GC | Bottlenecks | | **Shuffle/Spill** | Excessive shuffle, spill | Spill issues | | **Skew** | Data skew (quantiles) | Skew ratio | | **SQL Plan** | Inefficient joins | Join issues | | **Config** | Suboptimal settings | Config changes | | **Code** | Anti-patterns | Code fixes | ## 🔄 Agent Analysis Flow ``` Context → Build Prompt → LLM API → Parse JSON → Return Insights ``` ## 📁 Key Files - `src/main.py` - CLI entry point - `src/server.py` - MCP server - `src/optimizer/engine.py` - Orchestration - `src/optimizer/agents.py` - 6 agents - `src/client.py` - Spark History client - `src/llm_client.py` - Gemini API client ## 🚀 Usage ```bash # Analyze a job python3 spark_optimize.py \ --appId application_123 \ --historyUrl http://localhost:18080 \ --jobCode job.py \ --output report.json ``` ## 📈 Example Output ```json { "app_id": "application_123", "skew_analysis": [{ "is_skewed": true, "skew_ratio": 10.0, "stage_id": 0 }], "recommendations": [ { "category": "Configuration", "suggestion": "Enable AQE", "impact_level": "High" }, { "category": "Code", "suggestion": "Use dynamic partitioning", "impact_level": "Medium" } ] } ``` ## 🛡️ Error Handling - Network errors → Log + return empty - JSON parse errors → Use raw text - Rate limits → Exponential backoff retry - Missing data → Default values ## 📚 Full Documentation - [FLOW_EXPLANATION.md](file:///Users/user/Documents/spark_mcp_optimizer/FLOW_EXPLANATION.md) - Detailed technical flow - [README.md](file:///Users/user/Documents/spark_mcp_optimizer/README.md) - Complete system overview - [DEPLOYMENT.md](file:///Users/user/Documents/spark_mcp_optimizer/docs/DEPLOYMENT.md) - Production deployment

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/ravipesala/spark_mcp_optimizer'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

QUICK_REFERENCE.md•2.38 KiB