Skip to main content
Glama
blitzstermayank

Teradata MCP Server

sql_Analyze_Cluster_Stats

Analyze pre-computed SQL query cluster statistics to identify optimization opportunities, prioritize high-impact clusters, and detect performance issues like CPU/I/O skew without re-running clustering.

Instructions

ANALYZE SQL QUERY CLUSTER PERFORMANCE STATISTICS

This tool analyzes pre-computed cluster statistics to identify optimization opportunities without re-running the clustering pipeline. Perfect for iterative analysis and decision-making on which query clusters to focus optimization efforts.

ANALYSIS CAPABILITIES:

  • Performance Ranking: Sort clusters by any performance metric to identify top resource consumers

  • Resource Impact Assessment: Compare clusters by CPU usage, I/O volume, and execution complexity

  • Skew Problem Detection: Identify clusters with CPU or I/O distribution issues

  • Workload Characterization: Understand query patterns by user, application, and workload type

  • Optimization Prioritization: Focus on clusters with highest impact potential

AVAILABLE SORTING METRICS:

  • avg_cpu: Average CPU seconds per cluster (primary optimization target)

  • avg_io: Average logical I/O operations (scan intensity indicator)

  • avg_cpuskw: Average CPU skew (distribution problem indicator)

  • avg_ioskw: Average I/O skew (hot spot indicator)

  • avg_pji: Average Physical-to-Logical I/O ratio (compute intensity)

  • avg_uii: Average Unit I/O Intensity (I/O efficiency)

  • avg_numsteps: Average query plan complexity

  • queries: Number of queries in cluster (frequency indicator)

  • cluster_silhouette_score: Clustering quality measure

PERFORMANCE CATEGORIZATION: Automatically categorizes clusters using configurable thresholds (from sql_opt_config.yml):

  • HIGH_CPU_USAGE: Average CPU > config.performance_thresholds.cpu.high

  • HIGH_IO_USAGE: Average I/O > config.performance_thresholds.io.high

  • HIGH_CPU_SKEW: CPU skew > config.performance_thresholds.skew.high

  • HIGH_IO_SKEW: I/O skew > config.performance_thresholds.skew.high

  • NORMAL: Clusters within configured normal performance ranges

TYPICAL ANALYSIS WORKFLOW:

  1. Sort by 'avg_cpu' or 'avg_io' to find highest resource consumers

  2. Sort by 'avg_cpuskw' or 'avg_ioskw' to find distribution problems

  3. Use limit_results to focus on top problematic clusters

OPTIMIZATION DECISION FRAMEWORK:

  • High CPU + High Query Count: Maximum impact optimization candidates

  • High Skew + Moderate CPU: Distribution/statistics problems

  • High I/O + Low PJI: Potential indexing opportunities

  • High NumSteps: Complex query rewriting candidates

OUTPUT FORMAT: Returns detailed cluster statistics with performance rankings, categories, and metadata for LLM analysis and optimization recommendations.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
sort_by_metricNoavg_cpu
limit_resultsNo

Implementation Reference

  • The primary handler function for the 'sql_Analyze_Cluster_Stats' tool. It connects to the database, executes a parameterized SQL query to retrieve and analyze cluster performance statistics from the query_cluster_stats table, applies sorting and performance categorization based on configurable thresholds, computes summary statistics, and returns a structured JSON response with metadata.
    def handle_sql_Analyze_Cluster_Stats(
        conn,
        sort_by_metric: str = "avg_cpu",
        limit_results: int = None,
        *args,
        **kwargs
    ):
        """
            **ANALYZE SQL QUERY CLUSTER PERFORMANCE STATISTICS**
    
            This tool analyzes pre-computed cluster statistics to identify optimization opportunities without re-running the clustering pipeline. Perfect for iterative analysis and decision-making on which query clusters to focus optimization efforts.
    
            **ANALYSIS CAPABILITIES:**
            - **Performance Ranking**: Sort clusters by any performance metric to identify top resource consumers
            - **Resource Impact Assessment**: Compare clusters by CPU usage, I/O volume, and execution complexity
            - **Skew Problem Detection**: Identify clusters with CPU or I/O distribution issues
            - **Workload Characterization**: Understand query patterns by user, application, and workload type
            - **Optimization Prioritization**: Focus on clusters with highest impact potential
    
            **AVAILABLE SORTING METRICS:**
            - **avg_cpu**: Average CPU seconds per cluster (primary optimization target)
            - **avg_io**: Average logical I/O operations (scan intensity indicator)
            - **avg_cpuskw**: Average CPU skew (distribution problem indicator)
            - **avg_ioskw**: Average I/O skew (hot spot indicator)
            - **avg_pji**: Average Physical-to-Logical I/O ratio (compute intensity)
            - **avg_uii**: Average Unit I/O Intensity (I/O efficiency)
            - **avg_numsteps**: Average query plan complexity
            - **queries**: Number of queries in cluster (frequency indicator)
            - **cluster_silhouette_score**: Clustering quality measure
    
            **PERFORMANCE CATEGORIZATION:**
            Automatically categorizes clusters using configurable thresholds (from sql_opt_config.yml):
            - **HIGH_CPU_USAGE**: Average CPU > config.performance_thresholds.cpu.high
            - **HIGH_IO_USAGE**: Average I/O > config.performance_thresholds.io.high
            - **HIGH_CPU_SKEW**: CPU skew > config.performance_thresholds.skew.high
            - **HIGH_IO_SKEW**: I/O skew > config.performance_thresholds.skew.high
            - **NORMAL**: Clusters within configured normal performance ranges
    
            **TYPICAL ANALYSIS WORKFLOW:**
            1. Sort by 'avg_cpu' or 'avg_io' to find highest resource consumers
            2. Sort by 'avg_cpuskw' or 'avg_ioskw' to find distribution problems
            4. Use limit_results to focus on top problematic clusters
    
            **OPTIMIZATION DECISION FRAMEWORK:**
            - **High CPU + High Query Count**: Maximum impact optimization candidates
            - **High Skew + Moderate CPU**: Distribution/statistics problems
            - **High I/O + Low PJI**: Potential indexing opportunities
            - **High NumSteps**: Complex query rewriting candidates
    
            **OUTPUT FORMAT:**
            Returns detailed cluster statistics with performance rankings, categories, and metadata for LLM analysis and optimization recommendations.
            """
        
        config = SQL_CLUSTERING_CONFIG
        
        logger.debug(f"handle_sql_Analyze_Cluster_Stats: sort_by={sort_by_metric}, limit={limit_results}")
        
        feature_db = config['databases']['feature_db']
        stats_table = config['tables']['query_cluster_stats']
        
        # Validate sort metric
        valid_metrics = [
            'avg_cpu', 'avg_io', 'avg_cpuskw', 'avg_ioskw', 'avg_pji', 'avg_uii',
            'avg_numsteps', 'queries', 'cluster_silhouette_score'
        ]
        
        if sort_by_metric not in valid_metrics:
            sort_by_metric = 'avg_cpu'  # Default fallback
    
        with conn.cursor() as cur:
            
            # Build the query with optional limit
            limit_clause = f"TOP {limit_results}" if limit_results else ""
            
            # Get thresholds from config
            thresholds = config.get('performance_thresholds', {})
            cpu_high = thresholds.get('cpu', {}).get('high', 100)
            skew_high = thresholds.get('skew', {}).get('high', 3.0)
            io_high = thresholds.get('io', {}).get('high', 1000000)
    
            stats_query = f"""
            SELECT {limit_clause}
                td_clusterid_kmeans,
                avg_numsteps, 
                var_numsteps,
                avg_cpu, 
                var_cpu,
                avg_io, 
                var_io,
                avg_cpuskw, 
                var_cpuskw,
                avg_ioskw, 
                var_ioskw,
                avg_pji, 
                var_pji,
                avg_uii, 
                var_uii,
                top_username,
                top_wdname,
                top_appid,
                overall_silhouette_score,
                cluster_silhouette_score,
                queries,
                -- Additional analysis columns with configurable thresholds
                CASE 
                    WHEN avg_cpuskw > {skew_high} THEN 'HIGH_CPU_SKEW'
                    WHEN avg_ioskw > {skew_high} THEN 'HIGH_IO_SKEW'
                    WHEN avg_cpu > {cpu_high} THEN 'HIGH_CPU_USAGE'
                    WHEN avg_io > {io_high} THEN 'HIGH_IO_USAGE'
                    ELSE 'NORMAL'
                END AS performance_category,
                RANK() OVER (ORDER BY {sort_by_metric} DESC) AS performance_rank
            FROM {feature_db}.{stats_table}
            ORDER BY {sort_by_metric} DESC
            """
            
            cur.execute(stats_query)
            data = rows_to_json(cur.description, cur.fetchall())
            
            # Get summary statistics
            cur.execute(f"""
            SELECT 
                COUNT(*) AS total_clusters,
                AVG(avg_cpu) AS system_avg_cpu,
                AVG(avg_io) AS system_avg_io,
                AVG(queries) AS avg_queries_per_cluster,
                MAX(avg_cpu) AS max_cluster_cpu,
                MIN(cluster_silhouette_score) AS min_silhouette_score
            FROM {feature_db}.{stats_table}
            """)
            
            summary_stats = rows_to_json(cur.description, cur.fetchall())[0]
            
            logger.debug(f"Retrieved {len(data)} cluster statistics")
    
        # Return results with metadata
        metadata = {
            "tool_name": "sql_Analyze_Cluster_Stats",
            "analysis_parameters": {
                "sort_by_metric": sort_by_metric,
                "limit_results": limit_results,
                "valid_metrics": valid_metrics
            },
            "summary_statistics": summary_stats,
            "clusters_analyzed": len(data),
            "table_source": f"{feature_db}.{stats_table}",
            "description": f"Cluster statistics analysis sorted by {sort_by_metric} - ready for LLM optimization recommendations"
        }
    
        return create_response(data, metadata)
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It effectively describes key behaviors: it's a read-only analysis tool (implied by 'analyzes' and no mention of modifications), uses configurable thresholds from sql_opt_config.yml, categorizes clusters automatically, and returns detailed statistics for LLM analysis. However, it doesn't mention potential limitations like data freshness or performance characteristics.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured with clear sections (ANALYSIS CAPABILITIES, AVAILABLE SORTING METRICS, etc.), but it's overly verbose at 400+ words. Some content like the 'OPTIMIZATION DECISION FRAMEWORK' could be condensed, and not every sentence earns its place for a tool with only 2 parameters. However, it remains readable and front-loaded with key purpose.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity of SQL cluster analysis and no output schema, the description provides substantial context: capabilities, metrics, categorization, workflow, and output format. It compensates well for the lack of annotations and schema descriptions. However, without an output schema, it could benefit from more specific details about the return structure (e.g., fields in the statistics).

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, so the description must fully compensate. It provides extensive parameter semantics: it explains that 'sort_by_metric' accepts specific metrics like 'avg_cpu' or 'avg_io' (listing 9 options with explanations), and 'limit_results' is used to 'focus on top problematic clusters' in the workflow section. This adds significant meaning beyond the bare schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description explicitly states the tool 'analyzes pre-computed cluster statistics to identify optimization opportunities without re-running the clustering pipeline.' It specifies the verb 'analyzes' and resource 'pre-computed cluster statistics,' and clearly distinguishes from sibling tools like sql_Execute_Full_Pipeline (which runs the pipeline) and sql_Retrieve_Cluster_Queries (which retrieves queries).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit guidance on when to use this tool: 'Perfect for iterative analysis and decision-making on which query clusters to focus optimization efforts.' It includes a 'TYPICAL ANALYSIS WORKFLOW' section with step-by-step instructions and references alternatives like sql_Execute_Full_Pipeline implicitly by stating it works 'without re-running the clustering pipeline.'

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/blitzstermayank/MCP'

If you have feedback or need assistance with the MCP directory API, please join our Discord server