Spark History MCP Server

actions_fixed.json•3.97 KiB

{ "app_id": "application_1768524288842_0017", "skew_analysis": [ { "is_skewed": true, "skew_ratio": null, "max_duration": 634.0, "median_duration": 634.0, "stage_id": 0 }, { "is_skewed": true, "skew_ratio": null, "max_duration": 407.0, "median_duration": 407.0, "stage_id": 3 }, { "is_skewed": true, "skew_ratio": null, "max_duration": 469.0, "median_duration": 469.0, "stage_id": 9 }, { "is_skewed": true, "skew_ratio": null, "max_duration": 105.0, "median_duration": 105.0, "stage_id": 1 }, { "is_skewed": true, "skew_ratio": null, "max_duration": 100.0, "median_duration": 100.0, "stage_id": 4 } ], "spill_analysis": [], "resource_analysis": [], "partitioning_analysis": [], "join_analysis": [], "recommendations": [ { "category": "Configuration", "issue": "The driver is running in client mode, and may benefit from increased memory to handle tasks, especially with high executor run times observed in stages 0 and 9.", "suggestion": "Set spark.driver.memory to 2g", "evidence": "Current: default", "impact_level": "High" }, { "category": "Configuration", "issue": "Increase executor memory to reduce GC pressure observed in stages 0, 9, 3, 4, and 1, which have high executor run and cpu times.", "suggestion": "Set spark.executor.memory to 2g", "evidence": "Current: default", "impact_level": "High" }, { "category": "Configuration", "issue": "Increase executor cores for better parallelism and resource utilization, potentially improving execution time for stages 0, 9, 3, 4, and 1.", "suggestion": "Set spark.executor.cores to 2", "evidence": "Current: default", "impact_level": "High" }, { "category": "Configuration", "issue": "Increase default parallelism to match the size of the data being processed, to improve the CPU utilization in stages 0, 9, 3, 4, and 1.", "suggestion": "Set spark.default.parallelism to 200", "evidence": "Current: default", "impact_level": "High" }, { "category": "Configuration", "issue": "Switch to KryoSerializer for potentially faster serialization and reduced memory footprint, impacting all stages. Kryo is generally more efficient than Java serialization.", "suggestion": "Set spark.serializer to org.apache.spark.serializer.KryoSerializer", "evidence": "Current: org.apache.spark.serializer.JavaSerializer", "impact_level": "High" }, { "category": "Code", "issue": "Action within a loop (count)", "suggestion": "While the `df` is cached, the `filtered` DataFrame is recomputed in each iteration of the loop due to the filter operation. You can avoid this by collecting the counts of filtered DataFrames using a map transformation and a single aggregation. For instance, perform all filter operations and count transformations *before* collecting the results into `all_counts`.", "evidence": "Line: 22", "impact_level": "Medium" }, { "category": "Code", "issue": "Iterating over cached results.", "suggestion": "Consider using `map` transformations to collect counts from each filtered DF concurrently. This might improve execution time instead of iterating through cached values one at a time. Convert `filtered_dfs` to an RDD to operate on it using map partitions.", "evidence": "Line: 21", "impact_level": "Medium" }, { "category": "Code", "issue": "Using `count()` to materialize the cache. It's not wrong, but there may be more optimal operations.", "suggestion": "Consider using a `noop` action or a lightweight transformation with an action, like `df.foreach(lambda x: None)` which may be faster than a full `count()` depending on the dataset size.", "evidence": "Line: 15", "impact_level": "Medium" } ] }

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/ravipesala/spark_mcp_optimizer'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

actions_fixed.json•3.97 KiB