Skip to main content
Glama
blitzstermayank

Teradata MCP Server

sql_Analyze_Cluster_Stats

Analyze pre-computed SQL query cluster statistics to identify optimization opportunities, prioritize high-impact clusters, and detect performance issues like CPU/I/O skew without re-running clustering.

Instructions

ANALYZE SQL QUERY CLUSTER PERFORMANCE STATISTICS

This tool analyzes pre-computed cluster statistics to identify optimization opportunities without re-running the clustering pipeline. Perfect for iterative analysis and decision-making on which query clusters to focus optimization efforts.

ANALYSIS CAPABILITIES:

  • Performance Ranking: Sort clusters by any performance metric to identify top resource consumers

  • Resource Impact Assessment: Compare clusters by CPU usage, I/O volume, and execution complexity

  • Skew Problem Detection: Identify clusters with CPU or I/O distribution issues

  • Workload Characterization: Understand query patterns by user, application, and workload type

  • Optimization Prioritization: Focus on clusters with highest impact potential

AVAILABLE SORTING METRICS:

  • avg_cpu: Average CPU seconds per cluster (primary optimization target)

  • avg_io: Average logical I/O operations (scan intensity indicator)

  • avg_cpuskw: Average CPU skew (distribution problem indicator)

  • avg_ioskw: Average I/O skew (hot spot indicator)

  • avg_pji: Average Physical-to-Logical I/O ratio (compute intensity)

  • avg_uii: Average Unit I/O Intensity (I/O efficiency)

  • avg_numsteps: Average query plan complexity

  • queries: Number of queries in cluster (frequency indicator)

  • cluster_silhouette_score: Clustering quality measure

PERFORMANCE CATEGORIZATION: Automatically categorizes clusters using configurable thresholds (from sql_opt_config.yml):

  • HIGH_CPU_USAGE: Average CPU > config.performance_thresholds.cpu.high

  • HIGH_IO_USAGE: Average I/O > config.performance_thresholds.io.high

  • HIGH_CPU_SKEW: CPU skew > config.performance_thresholds.skew.high

  • HIGH_IO_SKEW: I/O skew > config.performance_thresholds.skew.high

  • NORMAL: Clusters within configured normal performance ranges

TYPICAL ANALYSIS WORKFLOW:

  1. Sort by 'avg_cpu' or 'avg_io' to find highest resource consumers

  2. Sort by 'avg_cpuskw' or 'avg_ioskw' to find distribution problems

  3. Use limit_results to focus on top problematic clusters

OPTIMIZATION DECISION FRAMEWORK:

  • High CPU + High Query Count: Maximum impact optimization candidates

  • High Skew + Moderate CPU: Distribution/statistics problems

  • High I/O + Low PJI: Potential indexing opportunities

  • High NumSteps: Complex query rewriting candidates

OUTPUT FORMAT: Returns detailed cluster statistics with performance rankings, categories, and metadata for LLM analysis and optimization recommendations.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
sort_by_metricNoavg_cpu
limit_resultsNo

Implementation Reference

  • The primary handler function for the 'sql_Analyze_Cluster_Stats' tool. It connects to the database, executes a parameterized SQL query to retrieve and analyze cluster performance statistics from the query_cluster_stats table, applies sorting and performance categorization based on configurable thresholds, computes summary statistics, and returns a structured JSON response with metadata.
    def handle_sql_Analyze_Cluster_Stats(
        conn,
        sort_by_metric: str = "avg_cpu",
        limit_results: int = None,
        *args,
        **kwargs
    ):
        """
            **ANALYZE SQL QUERY CLUSTER PERFORMANCE STATISTICS**
    
            This tool analyzes pre-computed cluster statistics to identify optimization opportunities without re-running the clustering pipeline. Perfect for iterative analysis and decision-making on which query clusters to focus optimization efforts.
    
            **ANALYSIS CAPABILITIES:**
            - **Performance Ranking**: Sort clusters by any performance metric to identify top resource consumers
            - **Resource Impact Assessment**: Compare clusters by CPU usage, I/O volume, and execution complexity
            - **Skew Problem Detection**: Identify clusters with CPU or I/O distribution issues
            - **Workload Characterization**: Understand query patterns by user, application, and workload type
            - **Optimization Prioritization**: Focus on clusters with highest impact potential
    
            **AVAILABLE SORTING METRICS:**
            - **avg_cpu**: Average CPU seconds per cluster (primary optimization target)
            - **avg_io**: Average logical I/O operations (scan intensity indicator)
            - **avg_cpuskw**: Average CPU skew (distribution problem indicator)
            - **avg_ioskw**: Average I/O skew (hot spot indicator)
            - **avg_pji**: Average Physical-to-Logical I/O ratio (compute intensity)
            - **avg_uii**: Average Unit I/O Intensity (I/O efficiency)
            - **avg_numsteps**: Average query plan complexity
            - **queries**: Number of queries in cluster (frequency indicator)
            - **cluster_silhouette_score**: Clustering quality measure
    
            **PERFORMANCE CATEGORIZATION:**
            Automatically categorizes clusters using configurable thresholds (from sql_opt_config.yml):
            - **HIGH_CPU_USAGE**: Average CPU > config.performance_thresholds.cpu.high
            - **HIGH_IO_USAGE**: Average I/O > config.performance_thresholds.io.high
            - **HIGH_CPU_SKEW**: CPU skew > config.performance_thresholds.skew.high
            - **HIGH_IO_SKEW**: I/O skew > config.performance_thresholds.skew.high
            - **NORMAL**: Clusters within configured normal performance ranges
    
            **TYPICAL ANALYSIS WORKFLOW:**
            1. Sort by 'avg_cpu' or 'avg_io' to find highest resource consumers
            2. Sort by 'avg_cpuskw' or 'avg_ioskw' to find distribution problems
            4. Use limit_results to focus on top problematic clusters
    
            **OPTIMIZATION DECISION FRAMEWORK:**
            - **High CPU + High Query Count**: Maximum impact optimization candidates
            - **High Skew + Moderate CPU**: Distribution/statistics problems
            - **High I/O + Low PJI**: Potential indexing opportunities
            - **High NumSteps**: Complex query rewriting candidates
    
            **OUTPUT FORMAT:**
            Returns detailed cluster statistics with performance rankings, categories, and metadata for LLM analysis and optimization recommendations.
            """
        
        config = SQL_CLUSTERING_CONFIG
        
        logger.debug(f"handle_sql_Analyze_Cluster_Stats: sort_by={sort_by_metric}, limit={limit_results}")
        
        feature_db = config['databases']['feature_db']
        stats_table = config['tables']['query_cluster_stats']
        
        # Validate sort metric
        valid_metrics = [
            'avg_cpu', 'avg_io', 'avg_cpuskw', 'avg_ioskw', 'avg_pji', 'avg_uii',
            'avg_numsteps', 'queries', 'cluster_silhouette_score'
        ]
        
        if sort_by_metric not in valid_metrics:
            sort_by_metric = 'avg_cpu'  # Default fallback
    
        with conn.cursor() as cur:
            
            # Build the query with optional limit
            limit_clause = f"TOP {limit_results}" if limit_results else ""
            
            # Get thresholds from config
            thresholds = config.get('performance_thresholds', {})
            cpu_high = thresholds.get('cpu', {}).get('high', 100)
            skew_high = thresholds.get('skew', {}).get('high', 3.0)
            io_high = thresholds.get('io', {}).get('high', 1000000)
    
            stats_query = f"""
            SELECT {limit_clause}
                td_clusterid_kmeans,
                avg_numsteps, 
                var_numsteps,
                avg_cpu, 
                var_cpu,
                avg_io, 
                var_io,
                avg_cpuskw, 
                var_cpuskw,
                avg_ioskw, 
                var_ioskw,
                avg_pji, 
                var_pji,
                avg_uii, 
                var_uii,
                top_username,
                top_wdname,
                top_appid,
                overall_silhouette_score,
                cluster_silhouette_score,
                queries,
                -- Additional analysis columns with configurable thresholds
                CASE 
                    WHEN avg_cpuskw > {skew_high} THEN 'HIGH_CPU_SKEW'
                    WHEN avg_ioskw > {skew_high} THEN 'HIGH_IO_SKEW'
                    WHEN avg_cpu > {cpu_high} THEN 'HIGH_CPU_USAGE'
                    WHEN avg_io > {io_high} THEN 'HIGH_IO_USAGE'
                    ELSE 'NORMAL'
                END AS performance_category,
                RANK() OVER (ORDER BY {sort_by_metric} DESC) AS performance_rank
            FROM {feature_db}.{stats_table}
            ORDER BY {sort_by_metric} DESC
            """
            
            cur.execute(stats_query)
            data = rows_to_json(cur.description, cur.fetchall())
            
            # Get summary statistics
            cur.execute(f"""
            SELECT 
                COUNT(*) AS total_clusters,
                AVG(avg_cpu) AS system_avg_cpu,
                AVG(avg_io) AS system_avg_io,
                AVG(queries) AS avg_queries_per_cluster,
                MAX(avg_cpu) AS max_cluster_cpu,
                MIN(cluster_silhouette_score) AS min_silhouette_score
            FROM {feature_db}.{stats_table}
            """)
            
            summary_stats = rows_to_json(cur.description, cur.fetchall())[0]
            
            logger.debug(f"Retrieved {len(data)} cluster statistics")
    
        # Return results with metadata
        metadata = {
            "tool_name": "sql_Analyze_Cluster_Stats",
            "analysis_parameters": {
                "sort_by_metric": sort_by_metric,
                "limit_results": limit_results,
                "valid_metrics": valid_metrics
            },
            "summary_statistics": summary_stats,
            "clusters_analyzed": len(data),
            "table_source": f"{feature_db}.{stats_table}",
            "description": f"Cluster statistics analysis sorted by {sort_by_metric} - ready for LLM optimization recommendations"
        }
    
        return create_response(data, metadata)

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/blitzstermayank/MCP'

If you have feedback or need assistance with the MCP directory API, please join our Discord server