Spark History MCP Server

DEPLOYMENT.md•3.66 KiB

# Production Deployment Guide ## Prerequisites - Python 3.9+ - Access to Spark History Server - Gemini API Key ## Installation ### 1. Clone and Setup ```bash cd /path/to/deployment git clone <repository-url> spark_optimizer cd spark_optimizer ``` ### 2. Install Dependencies ```bash pip install -r requirements.txt ``` ### 3. Configure Environment ```bash cp .env.example .env # Edit .env with your settings nano .env ``` Required settings: - `GEMINI_API_KEY`: Your Gemini API key - `SPARK_OPT_HISTORY_URL`: Spark History Server URL ## Running the Optimizer ### CLI Mode ```bash export $(cat .env | xargs) python3 spark_optimize.py \ --appId application_1234567890_0001 \ --historyUrl http://your-history-server:18080 \ --jobCode path/to/job.py \ --output reports/analysis.json ``` ### As a Service (MCP Server) ```bash export $(cat .env | xargs) python3 -m src.server ``` The MCP server will start on the default port and expose tools for: - `get_application_summary` - `get_jobs` - `get_stages` - `get_executors` - `get_sql_executions` - etc. ## Production Considerations ### 1. Resource Requirements - **Memory**: 2GB minimum, 4GB recommended - **CPU**: 2 cores minimum - **Network**: Low latency to Spark History Server ### 2. Rate Limiting The system implements exponential backoff for Gemini API rate limits: - Initial retry delay: 5 seconds - Max retries: 5 - Exponential backoff multiplier: 2x Configure via: ```bash SPARK_OPT_MAX_RETRIES=5 SPARK_OPT_RETRY_DELAY=5.0 ``` ### 3. Logging Set log level via environment: ```bash SPARK_OPT_LOG_LEVEL=INFO # DEBUG, INFO, WARNING, ERROR ``` Logs include: - API request/response details - Agent analysis steps - Error traces ### 4. Monitoring Monitor these metrics: - API call success rate - Analysis completion time - LLM token usage - Error rates by type ### 5. Security - **API Keys**: Store in environment variables, never commit - **Network**: Use HTTPS for Spark History Server if possible - **Access Control**: Restrict who can run analyses ## Troubleshooting ### Issue: "Connection refused" to Spark History Server **Solution**: Verify Spark History Server is running and accessible: ```bash curl http://localhost:18080/api/v1/applications ``` ### Issue: "Quota exceeded" from Gemini API **Solution**: The system auto-retries with backoff. If persistent: 1. Check API quota limits 2. Increase `SPARK_OPT_RETRY_DELAY` 3. Reduce analysis frequency ### Issue: Empty or incomplete reports **Solution**: 1. Check Spark History Server has complete data 2. Verify application ID is correct 3. Enable DEBUG logging to see agent responses ### Issue: High memory usage **Solution**: 1. Reduce `SPARK_OPT_MAX_STAGES` (default: 5) 2. Disable code analysis: `SPARK_OPT_CODE_ANALYSIS=false` 3. Process smaller applications first ## Performance Tuning ### For Large Clusters ```bash SPARK_OPT_MAX_STAGES=10 # Analyze more stages SPARK_OPT_TIMEOUT=60 # Longer timeout ``` ### For Rate-Limited Environments ```bash SPARK_OPT_MAX_RETRIES=10 SPARK_OPT_RETRY_DELAY=10.0 ``` ## Scaling ### Horizontal Scaling Deploy multiple instances behind a load balancer: ```bash # Instance 1 SPARK_OPT_HISTORY_URL=http://cluster1:18080 python3 -m src.server # Instance 2 SPARK_OPT_HISTORY_URL=http://cluster2:18080 python3 -m src.server ``` ### Batch Processing Process multiple applications: ```bash for app_id in $(cat app_ids.txt); do python3 spark_optimize.py --appId $app_id --output reports/${app_id}.json done ``` ## Health Checks ### Readiness Check ```bash curl http://localhost:8080/health ``` ### Liveness Check ```bash python3 -c "from src.client import SparkHistoryClient; c = SparkHistoryClient(); print('OK' if c.get_applications() is not None else 'FAIL')" ```

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/ravipesala/spark_mcp_optimizer'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

DEPLOYMENT.md•3.66 KiB