Skip to main content
Glama
Teradata

Teradata MCP Server

Official
by Teradata

sql_Analyze_Cluster_Stats

Analyze pre-computed SQL query cluster statistics to identify performance optimization opportunities by ranking clusters, assessing resource impact, detecting skew problems, and prioritizing optimization efforts.

Instructions

ANALYZE SQL QUERY CLUSTER PERFORMANCE STATISTICS

This tool analyzes pre-computed cluster statistics to identify optimization opportunities without re-running the clustering pipeline. Perfect for iterative analysis and decision-making on which query clusters to focus optimization efforts.

ANALYSIS CAPABILITIES:

  • Performance Ranking: Sort clusters by any performance metric to identify top resource consumers

  • Resource Impact Assessment: Compare clusters by CPU usage, I/O volume, and execution complexity

  • Skew Problem Detection: Identify clusters with CPU or I/O distribution issues

  • Workload Characterization: Understand query patterns by user, application, and workload type

  • Optimization Prioritization: Focus on clusters with highest impact potential

AVAILABLE SORTING METRICS:

  • avg_cpu: Average CPU seconds per cluster (primary optimization target)

  • avg_io: Average logical I/O operations (scan intensity indicator)

  • avg_cpuskw: Average CPU skew (distribution problem indicator)

  • avg_ioskw: Average I/O skew (hot spot indicator)

  • avg_pji: Average Physical-to-Logical I/O ratio (compute intensity)

  • avg_uii: Average Unit I/O Intensity (I/O efficiency)

  • avg_numsteps: Average query plan complexity

  • queries: Number of queries in cluster (frequency indicator)

  • cluster_silhouette_score: Clustering quality measure

PERFORMANCE CATEGORIZATION: Automatically categorizes clusters using configurable thresholds (from sql_opt_config.yml):

  • HIGH_CPU_USAGE: Average CPU > config.performance_thresholds.cpu.high

  • HIGH_IO_USAGE: Average I/O > config.performance_thresholds.io.high

  • HIGH_CPU_SKEW: CPU skew > config.performance_thresholds.skew.high

  • HIGH_IO_SKEW: I/O skew > config.performance_thresholds.skew.high

  • NORMAL: Clusters within configured normal performance ranges

TYPICAL ANALYSIS WORKFLOW:

  1. Sort by 'avg_cpu' or 'avg_io' to find highest resource consumers

  2. Sort by 'avg_cpuskw' or 'avg_ioskw' to find distribution problems

  3. Use limit_results to focus on top problematic clusters

OPTIMIZATION DECISION FRAMEWORK:

  • High CPU + High Query Count: Maximum impact optimization candidates

  • High Skew + Moderate CPU: Distribution/statistics problems

  • High I/O + Low PJI: Potential indexing opportunities

  • High NumSteps: Complex query rewriting candidates

OUTPUT FORMAT: Returns detailed cluster statistics with performance rankings, categories, and metadata for LLM analysis and optimization recommendations.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
sort_by_metricNoavg_cpu
limit_resultsNo

Implementation Reference

  • The main execution function for the sql_Analyze_Cluster_Stats tool. Connects to the database, queries cluster performance statistics from the pre-built query_cluster_stats table, sorts and limits results based on parameters, adds performance categorization and ranking, and formats results as JSON with comprehensive metadata.
    def handle_sql_Analyze_Cluster_Stats(
        conn,
        sort_by_metric: str = "avg_cpu",
        limit_results: int = None,
        *args,
        **kwargs
    ):
        """
            **ANALYZE SQL QUERY CLUSTER PERFORMANCE STATISTICS**
    
            This tool analyzes pre-computed cluster statistics to identify optimization opportunities without re-running the clustering pipeline. Perfect for iterative analysis and decision-making on which query clusters to focus optimization efforts.
    
            **ANALYSIS CAPABILITIES:**
            - **Performance Ranking**: Sort clusters by any performance metric to identify top resource consumers
            - **Resource Impact Assessment**: Compare clusters by CPU usage, I/O volume, and execution complexity
            - **Skew Problem Detection**: Identify clusters with CPU or I/O distribution issues
            - **Workload Characterization**: Understand query patterns by user, application, and workload type
            - **Optimization Prioritization**: Focus on clusters with highest impact potential
    
            **AVAILABLE SORTING METRICS:**
            - **avg_cpu**: Average CPU seconds per cluster (primary optimization target)
            - **avg_io**: Average logical I/O operations (scan intensity indicator)
            - **avg_cpuskw**: Average CPU skew (distribution problem indicator)
            - **avg_ioskw**: Average I/O skew (hot spot indicator)
            - **avg_pji**: Average Physical-to-Logical I/O ratio (compute intensity)
            - **avg_uii**: Average Unit I/O Intensity (I/O efficiency)
            - **avg_numsteps**: Average query plan complexity
            - **queries**: Number of queries in cluster (frequency indicator)
            - **cluster_silhouette_score**: Clustering quality measure
    
            **PERFORMANCE CATEGORIZATION:**
            Automatically categorizes clusters using configurable thresholds (from sql_opt_config.yml):
            - **HIGH_CPU_USAGE**: Average CPU > config.performance_thresholds.cpu.high
            - **HIGH_IO_USAGE**: Average I/O > config.performance_thresholds.io.high
            - **HIGH_CPU_SKEW**: CPU skew > config.performance_thresholds.skew.high
            - **HIGH_IO_SKEW**: I/O skew > config.performance_thresholds.skew.high
            - **NORMAL**: Clusters within configured normal performance ranges
    
            **TYPICAL ANALYSIS WORKFLOW:**
            1. Sort by 'avg_cpu' or 'avg_io' to find highest resource consumers
            2. Sort by 'avg_cpuskw' or 'avg_ioskw' to find distribution problems
            4. Use limit_results to focus on top problematic clusters
    
            **OPTIMIZATION DECISION FRAMEWORK:**
            - **High CPU + High Query Count**: Maximum impact optimization candidates
            - **High Skew + Moderate CPU**: Distribution/statistics problems
            - **High I/O + Low PJI**: Potential indexing opportunities
            - **High NumSteps**: Complex query rewriting candidates
    
            **OUTPUT FORMAT:**
            Returns detailed cluster statistics with performance rankings, categories, and metadata for LLM analysis and optimization recommendations.
            """
        
        config = SQL_CLUSTERING_CONFIG
        
        logger.debug(f"handle_sql_Analyze_Cluster_Stats: sort_by={sort_by_metric}, limit={limit_results}")
        
        feature_db = config['databases']['feature_db']
        stats_table = config['tables']['query_cluster_stats']
        
        # Validate sort metric
        valid_metrics = [
            'avg_cpu', 'avg_io', 'avg_cpuskw', 'avg_ioskw', 'avg_pji', 'avg_uii',
            'avg_numsteps', 'queries', 'cluster_silhouette_score'
        ]
        
        if sort_by_metric not in valid_metrics:
            sort_by_metric = 'avg_cpu'  # Default fallback
    
        with conn.cursor() as cur:
            
            # Build the query with optional limit
            limit_clause = f"TOP {limit_results}" if limit_results else ""
            
            # Get thresholds from config
            thresholds = config.get('performance_thresholds', {})
            cpu_high = thresholds.get('cpu', {}).get('high', 100)
            skew_high = thresholds.get('skew', {}).get('high', 3.0)
            io_high = thresholds.get('io', {}).get('high', 1000000)
    
            stats_query = f"""
            SELECT {limit_clause}
                td_clusterid_kmeans,
                avg_numsteps, 
                var_numsteps,
                avg_cpu, 
                var_cpu,
                avg_io, 
                var_io,
                avg_cpuskw, 
                var_cpuskw,
                avg_ioskw, 
                var_ioskw,
                avg_pji, 
                var_pji,
                avg_uii, 
                var_uii,
                top_username,
                top_wdname,
                top_appid,
                overall_silhouette_score,
                cluster_silhouette_score,
                queries,
                -- Additional analysis columns with configurable thresholds
                CASE 
                    WHEN avg_cpuskw > {skew_high} THEN 'HIGH_CPU_SKEW'
                    WHEN avg_ioskw > {skew_high} THEN 'HIGH_IO_SKEW'
                    WHEN avg_cpu > {cpu_high} THEN 'HIGH_CPU_USAGE'
                    WHEN avg_io > {io_high} THEN 'HIGH_IO_USAGE'
                    ELSE 'NORMAL'
                END AS performance_category,
                RANK() OVER (ORDER BY {sort_by_metric} DESC) AS performance_rank
            FROM {feature_db}.{stats_table}
            ORDER BY {sort_by_metric} DESC
            """
            
            cur.execute(stats_query)
            data = rows_to_json(cur.description, cur.fetchall())
            
            # Get summary statistics
            cur.execute(f"""
            SELECT 
                COUNT(*) AS total_clusters,
                AVG(avg_cpu) AS system_avg_cpu,
                AVG(avg_io) AS system_avg_io,
                AVG(queries) AS avg_queries_per_cluster,
                MAX(avg_cpu) AS max_cluster_cpu,
                MIN(cluster_silhouette_score) AS min_silhouette_score
            FROM {feature_db}.{stats_table}
            """)
            
            summary_stats = rows_to_json(cur.description, cur.fetchall())[0]
            
            logger.debug(f"Retrieved {len(data)} cluster statistics")
    
        # Return results with metadata
        metadata = {
            "tool_name": "sql_Analyze_Cluster_Stats",
            "analysis_parameters": {
                "sort_by_metric": sort_by_metric,
                "limit_results": limit_results,
                "valid_metrics": valid_metrics
            },
            "summary_statistics": summary_stats,
            "clusters_analyzed": len(data),
            "table_source": f"{feature_db}.{stats_table}",
            "description": f"Cluster statistics analysis sorted by {sort_by_metric} - ready for LLM optimization recommendations"
        }
    
        return create_response(data, metadata)
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It does well by detailing analysis capabilities, sorting metrics, performance categorization, and output format. It explains what the tool does (analyzes statistics) and what it returns (detailed cluster statistics with rankings). However, it lacks information on potential limitations, error conditions, or performance characteristics like response time.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured with clear sections (Analysis Capabilities, Available Sorting Metrics, etc.), but it is overly verbose for a tool with only 2 parameters. Sections like 'Optimization Decision Framework' and detailed metric explanations could be condensed. While informative, not every sentence earns its place relative to the tool's simplicity.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (performance analysis with multiple metrics) and no output schema, the description does a good job of explaining what the output contains ('detailed cluster statistics with performance rankings, categories, and metadata'). However, without annotations or an output schema, it could benefit from more specifics on the exact structure of the returned data or example outputs.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, so the description must fully compensate. It excels by providing extensive context for the two parameters: it lists all available sorting metrics (e.g., 'avg_cpu', 'avg_io') that map to 'sort_by_metric', and explains how to use 'limit_results' in the workflow ('Use limit_results to focus on top problematic clusters'). This adds significant meaning beyond the bare schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool 'analyzes pre-computed cluster statistics to identify optimization opportunities without re-running the clustering pipeline.' It specifies the verb 'analyzes' and resource 'pre-computed cluster statistics,' and distinguishes from sibling tools like sql_Execute_Full_Pipeline (which runs the pipeline) and sql_Retrieve_Cluster_Queries (which retrieves queries).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context for when to use this tool: 'Perfect for iterative analysis and decision-making on which query clusters to focus optimization efforts.' It includes a 'TYPICAL ANALYSIS WORKFLOW' section with step-by-step guidance. However, it does not explicitly state when NOT to use it or compare it to all alternatives among the many sibling tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/Teradata/teradata-mcp-server'

If you have feedback or need assistance with the MCP directory API, please join our Discord server