Skip to main content
Glama

sql_Analyze_Cluster_Stats

Analyze pre-computed SQL query cluster statistics to identify optimization opportunities, prioritize high-impact clusters, and detect performance issues like CPU/I/O skew without re-running clustering.

Instructions

ANALYZE SQL QUERY CLUSTER PERFORMANCE STATISTICS

This tool analyzes pre-computed cluster statistics to identify optimization opportunities without re-running the clustering pipeline. Perfect for iterative analysis and decision-making on which query clusters to focus optimization efforts.

ANALYSIS CAPABILITIES:

  • Performance Ranking: Sort clusters by any performance metric to identify top resource consumers

  • Resource Impact Assessment: Compare clusters by CPU usage, I/O volume, and execution complexity

  • Skew Problem Detection: Identify clusters with CPU or I/O distribution issues

  • Workload Characterization: Understand query patterns by user, application, and workload type

  • Optimization Prioritization: Focus on clusters with highest impact potential

AVAILABLE SORTING METRICS:

  • avg_cpu: Average CPU seconds per cluster (primary optimization target)

  • avg_io: Average logical I/O operations (scan intensity indicator)

  • avg_cpuskw: Average CPU skew (distribution problem indicator)

  • avg_ioskw: Average I/O skew (hot spot indicator)

  • avg_pji: Average Physical-to-Logical I/O ratio (compute intensity)

  • avg_uii: Average Unit I/O Intensity (I/O efficiency)

  • avg_numsteps: Average query plan complexity

  • queries: Number of queries in cluster (frequency indicator)

  • cluster_silhouette_score: Clustering quality measure

PERFORMANCE CATEGORIZATION: Automatically categorizes clusters using configurable thresholds (from sql_opt_config.yml):

  • HIGH_CPU_USAGE: Average CPU > config.performance_thresholds.cpu.high

  • HIGH_IO_USAGE: Average I/O > config.performance_thresholds.io.high

  • HIGH_CPU_SKEW: CPU skew > config.performance_thresholds.skew.high

  • HIGH_IO_SKEW: I/O skew > config.performance_thresholds.skew.high

  • NORMAL: Clusters within configured normal performance ranges

TYPICAL ANALYSIS WORKFLOW:

  1. Sort by 'avg_cpu' or 'avg_io' to find highest resource consumers

  2. Sort by 'avg_cpuskw' or 'avg_ioskw' to find distribution problems

  3. Use limit_results to focus on top problematic clusters

OPTIMIZATION DECISION FRAMEWORK:

  • High CPU + High Query Count: Maximum impact optimization candidates

  • High Skew + Moderate CPU: Distribution/statistics problems

  • High I/O + Low PJI: Potential indexing opportunities

  • High NumSteps: Complex query rewriting candidates

OUTPUT FORMAT: Returns detailed cluster statistics with performance rankings, categories, and metadata for LLM analysis and optimization recommendations.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
sort_by_metricNoavg_cpu
limit_resultsNo

Implementation Reference

  • The primary handler function for the 'sql_Analyze_Cluster_Stats' tool. It connects to the database, executes a parameterized SQL query to retrieve and analyze cluster performance statistics from the query_cluster_stats table, applies sorting and performance categorization based on configurable thresholds, computes summary statistics, and returns a structured JSON response with metadata.
    def handle_sql_Analyze_Cluster_Stats( conn, sort_by_metric: str = "avg_cpu", limit_results: int = None, *args, **kwargs ): """ **ANALYZE SQL QUERY CLUSTER PERFORMANCE STATISTICS** This tool analyzes pre-computed cluster statistics to identify optimization opportunities without re-running the clustering pipeline. Perfect for iterative analysis and decision-making on which query clusters to focus optimization efforts. **ANALYSIS CAPABILITIES:** - **Performance Ranking**: Sort clusters by any performance metric to identify top resource consumers - **Resource Impact Assessment**: Compare clusters by CPU usage, I/O volume, and execution complexity - **Skew Problem Detection**: Identify clusters with CPU or I/O distribution issues - **Workload Characterization**: Understand query patterns by user, application, and workload type - **Optimization Prioritization**: Focus on clusters with highest impact potential **AVAILABLE SORTING METRICS:** - **avg_cpu**: Average CPU seconds per cluster (primary optimization target) - **avg_io**: Average logical I/O operations (scan intensity indicator) - **avg_cpuskw**: Average CPU skew (distribution problem indicator) - **avg_ioskw**: Average I/O skew (hot spot indicator) - **avg_pji**: Average Physical-to-Logical I/O ratio (compute intensity) - **avg_uii**: Average Unit I/O Intensity (I/O efficiency) - **avg_numsteps**: Average query plan complexity - **queries**: Number of queries in cluster (frequency indicator) - **cluster_silhouette_score**: Clustering quality measure **PERFORMANCE CATEGORIZATION:** Automatically categorizes clusters using configurable thresholds (from sql_opt_config.yml): - **HIGH_CPU_USAGE**: Average CPU > config.performance_thresholds.cpu.high - **HIGH_IO_USAGE**: Average I/O > config.performance_thresholds.io.high - **HIGH_CPU_SKEW**: CPU skew > config.performance_thresholds.skew.high - **HIGH_IO_SKEW**: I/O skew > config.performance_thresholds.skew.high - **NORMAL**: Clusters within configured normal performance ranges **TYPICAL ANALYSIS WORKFLOW:** 1. Sort by 'avg_cpu' or 'avg_io' to find highest resource consumers 2. Sort by 'avg_cpuskw' or 'avg_ioskw' to find distribution problems 4. Use limit_results to focus on top problematic clusters **OPTIMIZATION DECISION FRAMEWORK:** - **High CPU + High Query Count**: Maximum impact optimization candidates - **High Skew + Moderate CPU**: Distribution/statistics problems - **High I/O + Low PJI**: Potential indexing opportunities - **High NumSteps**: Complex query rewriting candidates **OUTPUT FORMAT:** Returns detailed cluster statistics with performance rankings, categories, and metadata for LLM analysis and optimization recommendations. """ config = SQL_CLUSTERING_CONFIG logger.debug(f"handle_sql_Analyze_Cluster_Stats: sort_by={sort_by_metric}, limit={limit_results}") feature_db = config['databases']['feature_db'] stats_table = config['tables']['query_cluster_stats'] # Validate sort metric valid_metrics = [ 'avg_cpu', 'avg_io', 'avg_cpuskw', 'avg_ioskw', 'avg_pji', 'avg_uii', 'avg_numsteps', 'queries', 'cluster_silhouette_score' ] if sort_by_metric not in valid_metrics: sort_by_metric = 'avg_cpu' # Default fallback with conn.cursor() as cur: # Build the query with optional limit limit_clause = f"TOP {limit_results}" if limit_results else "" # Get thresholds from config thresholds = config.get('performance_thresholds', {}) cpu_high = thresholds.get('cpu', {}).get('high', 100) skew_high = thresholds.get('skew', {}).get('high', 3.0) io_high = thresholds.get('io', {}).get('high', 1000000) stats_query = f""" SELECT {limit_clause} td_clusterid_kmeans, avg_numsteps, var_numsteps, avg_cpu, var_cpu, avg_io, var_io, avg_cpuskw, var_cpuskw, avg_ioskw, var_ioskw, avg_pji, var_pji, avg_uii, var_uii, top_username, top_wdname, top_appid, overall_silhouette_score, cluster_silhouette_score, queries, -- Additional analysis columns with configurable thresholds CASE WHEN avg_cpuskw > {skew_high} THEN 'HIGH_CPU_SKEW' WHEN avg_ioskw > {skew_high} THEN 'HIGH_IO_SKEW' WHEN avg_cpu > {cpu_high} THEN 'HIGH_CPU_USAGE' WHEN avg_io > {io_high} THEN 'HIGH_IO_USAGE' ELSE 'NORMAL' END AS performance_category, RANK() OVER (ORDER BY {sort_by_metric} DESC) AS performance_rank FROM {feature_db}.{stats_table} ORDER BY {sort_by_metric} DESC """ cur.execute(stats_query) data = rows_to_json(cur.description, cur.fetchall()) # Get summary statistics cur.execute(f""" SELECT COUNT(*) AS total_clusters, AVG(avg_cpu) AS system_avg_cpu, AVG(avg_io) AS system_avg_io, AVG(queries) AS avg_queries_per_cluster, MAX(avg_cpu) AS max_cluster_cpu, MIN(cluster_silhouette_score) AS min_silhouette_score FROM {feature_db}.{stats_table} """) summary_stats = rows_to_json(cur.description, cur.fetchall())[0] logger.debug(f"Retrieved {len(data)} cluster statistics") # Return results with metadata metadata = { "tool_name": "sql_Analyze_Cluster_Stats", "analysis_parameters": { "sort_by_metric": sort_by_metric, "limit_results": limit_results, "valid_metrics": valid_metrics }, "summary_statistics": summary_stats, "clusters_analyzed": len(data), "table_source": f"{feature_db}.{stats_table}", "description": f"Cluster statistics analysis sorted by {sort_by_metric} - ready for LLM optimization recommendations" } return create_response(data, metadata)

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/blitzstermayank/MCP'

If you have feedback or need assistance with the MCP directory API, please join our Discord server