# Quick Reference: System Flow
## 🎯 One-Minute Overview
```
User → Engine → [Spark Client + LLM Agents] → Report
```
## 📊 7 Phases
1. **User Input** - CLI or MCP Server
2. **Context Gathering** - Fetch metrics from Spark History Server
3. **Agent Orchestration** - Run 6 specialized agents
4. **Agent Analysis** - Each agent analyzes specific aspect
5. **LLM Interaction** - Agents call Gemini API for insights
6. **Report Building** - Consolidate all agent outputs
7. **Output** - Save JSON report
## 🤖 6 Specialized Agents
| Agent | Detects | Output |
|-------|---------|--------|
| **Execution** | Long stages, imbalance, GC | Bottlenecks |
| **Shuffle/Spill** | Excessive shuffle, spill | Spill issues |
| **Skew** | Data skew (quantiles) | Skew ratio |
| **SQL Plan** | Inefficient joins | Join issues |
| **Config** | Suboptimal settings | Config changes |
| **Code** | Anti-patterns | Code fixes |
## 🔄 Agent Analysis Flow
```
Context → Build Prompt → LLM API → Parse JSON → Return Insights
```
## 📁 Key Files
- `src/main.py` - CLI entry point
- `src/server.py` - MCP server
- `src/optimizer/engine.py` - Orchestration
- `src/optimizer/agents.py` - 6 agents
- `src/client.py` - Spark History client
- `src/llm_client.py` - Gemini API client
## 🚀 Usage
```bash
# Analyze a job
python3 spark_optimize.py \
--appId application_123 \
--historyUrl http://localhost:18080 \
--jobCode job.py \
--output report.json
```
## 📈 Example Output
```json
{
"app_id": "application_123",
"skew_analysis": [{
"is_skewed": true,
"skew_ratio": 10.0,
"stage_id": 0
}],
"recommendations": [
{
"category": "Configuration",
"suggestion": "Enable AQE",
"impact_level": "High"
},
{
"category": "Code",
"suggestion": "Use dynamic partitioning",
"impact_level": "Medium"
}
]
}
```
## 🛡️ Error Handling
- Network errors → Log + return empty
- JSON parse errors → Use raw text
- Rate limits → Exponential backoff retry
- Missing data → Default values
## 📚 Full Documentation
- [FLOW_EXPLANATION.md](file:///Users/user/Documents/spark_mcp_optimizer/FLOW_EXPLANATION.md) - Detailed technical flow
- [README.md](file:///Users/user/Documents/spark_mcp_optimizer/README.md) - Complete system overview
- [DEPLOYMENT.md](file:///Users/user/Documents/spark_mcp_optimizer/docs/DEPLOYMENT.md) - Production deployment