Teradata MCP Server

sql_Execute_Full_Pipeline

Execute a complete SQL query clustering pipeline to identify high CPU usage queries for optimization analysis, extracting logs, calculating performance metrics, generating embeddings, and grouping similar queries.

Instructions

COMPLETE SQL QUERY CLUSTERING PIPELINE FOR HIGH-USAGE QUERY OPTIMIZATION

This tool executes the entire SQL query clustering workflow to identify and analyze high CPU usage queries for optimization opportunities. It's designed for database performance analysts and DBAs who need to systematically identify query optimization candidates.

FULL PIPELINE WORKFLOW:

Query Log Extraction: Extracts SQL queries from DBC.DBQLSqlTbl with comprehensive performance metrics
Performance Metrics Calculation: Computes CPU skew, I/O skew, PJI (Physical to Logical I/O ratio), UII (Unit I/O Intensity)
Query Tokenization: Tokenizes SQL text using {sql_clustering_config.get('model', {}).get('model_id', 'bge-small-en-v1.5')} tokenizer via ivsm.tokenizer_encode
Embedding Generation: Creates semantic embeddings using ivsm.IVSM_score with ONNX models
Vector Store Creation: Converts embeddings to vector columns via ivsm.vector_to_columns
K-Means Clustering: Groups similar queries using TD_KMeans with optimal K from configuration
Silhouette Analysis: Calculates clustering quality scores using TD_Silhouette
Statistics Generation: Creates comprehensive cluster statistics with performance aggregations

PERFORMANCE METRICS EXPLAINED:

AMPCPUTIME: Total CPU seconds across all AMPs (primary optimization target)
CPUSKW/IOSKW: CPU/I/O skew ratios (>2.0 indicates distribution problems)
PJI: Physical-to-Logical I/O ratio (higher = more CPU-intensive)
UII: Unit I/O Intensity (higher = more I/O-intensive relative to CPU)
LogicalIO: Total logical I/O operations (indicates scan intensity)
NumSteps: Query plan complexity (higher = more complex plans)

CONFIGURATION (from sql_opt_config.yml):

Uses top {default_max_queries} queries by CPU time (configurable)
Creates {default_optimal_k} clusters by default (configurable via optimal_k parameter)
Embedding model: {sql_clustering_config.get('model', {}).get('model_id', 'bge-small-en-v1.5')}
Vector dimensions: {sql_clustering_config.get('embedding', {}).get('vector_length', 384)}
All database and table names are configurable

OPTIMIZATION WORKFLOW: After running this tool, use:

sql_Analyze_Cluster_Stats to identify problematic clusters
sql_Retrieve_Cluster_Queries to get actual SQL from target clusters
LLM analysis to identify patterns and propose specific optimizations

USE CASES:

Identify query families consuming the most system resources
Find queries with similar patterns but different performance
Discover optimization opportunities through clustering analysis
Prioritize DBA effort on highest-impact query improvements
Understand workload composition and resource distribution

PREREQUISITES:

DBC.DBQLSqlTbl and DBC.DBQLOgTbl must be accessible
Embedding models and tokenizers must be installed in feature_ext_db
Sufficient space in feature_ext_db for intermediate and final tables

Input Schema

TableJSON Schema

Name	Required	Description	Default
`optimal_k`	No
`max_queries`	No

Implementation Reference

src/teradata_mcp_server/tools/sql_opt/sql_opt_tools.py:114-518 (handler)

The core handler function implementing the full SQL query clustering pipeline: extracts high-CPU queries from DBQL logs, tokenizes SQL, generates embeddings using bge-small-en-v1.5 model, performs K-means clustering, computes silhouette scores, and generates comprehensive cluster statistics for optimization analysis.

def handle_sql_Execute_Full_Pipeline(
    conn,
    optimal_k: int = None,
    max_queries: int = None,
    *args,
    **kwargs
):
    """
        **COMPLETE SQL QUERY CLUSTERING PIPELINE FOR HIGH-USAGE QUERY OPTIMIZATION**

        This tool executes the entire SQL query clustering workflow to identify and analyze high CPU usage queries for optimization opportunities. It's designed for database performance analysts and DBAs who need to systematically identify query optimization candidates.

        **FULL PIPELINE WORKFLOW:**
        1. **Query Log Extraction**: Extracts SQL queries from DBC.DBQLSqlTbl with comprehensive performance metrics
        2. **Performance Metrics Calculation**: Computes CPU skew, I/O skew, PJI (Physical to Logical I/O ratio), UII (Unit I/O Intensity)
        3. **Query Tokenization**: Tokenizes SQL text using {sql_clustering_config.get('model', {}).get('model_id', 'bge-small-en-v1.5')} tokenizer via ivsm.tokenizer_encode
        4. **Embedding Generation**: Creates semantic embeddings using ivsm.IVSM_score with ONNX models
        5. **Vector Store Creation**: Converts embeddings to vector columns via ivsm.vector_to_columns
        6. **K-Means Clustering**: Groups similar queries using TD_KMeans with optimal K from configuration
        7. **Silhouette Analysis**: Calculates clustering quality scores using TD_Silhouette
        8. **Statistics Generation**: Creates comprehensive cluster statistics with performance aggregations

        **PERFORMANCE METRICS EXPLAINED:**
        - **AMPCPUTIME**: Total CPU seconds across all AMPs (primary optimization target)
        - **CPUSKW/IOSKW**: CPU/I/O skew ratios (>2.0 indicates distribution problems)
        - **PJI**: Physical-to-Logical I/O ratio (higher = more CPU-intensive)
        - **UII**: Unit I/O Intensity (higher = more I/O-intensive relative to CPU)
        - **LogicalIO**: Total logical I/O operations (indicates scan intensity)
        - **NumSteps**: Query plan complexity (higher = more complex plans)

        **CONFIGURATION (from sql_opt_config.yml):**
        - Uses top {default_max_queries} queries by CPU time (configurable)
        - Creates {default_optimal_k} clusters by default (configurable via optimal_k parameter)
        - Embedding model: {sql_clustering_config.get('model', {}).get('model_id', 'bge-small-en-v1.5')}
        - Vector dimensions: {sql_clustering_config.get('embedding', {}).get('vector_length', 384)}
        - All database and table names are configurable

        **OPTIMIZATION WORKFLOW:**
        After running this tool, use:
        1. sql_Analyze_Cluster_Stats to identify problematic clusters
        2. sql_Retrieve_Cluster_Queries to get actual SQL from target clusters
        3. LLM analysis to identify patterns and propose specific optimizations

        **USE CASES:**
        - Identify query families consuming the most system resources
        - Find queries with similar patterns but different performance
        - Discover optimization opportunities through clustering analysis
        - Prioritize DBA effort on highest-impact query improvements
        - Understand workload composition and resource distribution

        **PREREQUISITES:**
        - DBC.DBQLSqlTbl and DBC.DBQLOgTbl must be accessible
        - Embedding models and tokenizers must be installed in feature_ext_db
        - Sufficient space in feature_ext_db for intermediate and final tables
        """
    
    config = SQL_CLUSTERING_CONFIG
    
    # Use config defaults if not provided
    if optimal_k is None:
        optimal_k = config['clustering']['optimal_k']
    if max_queries is None:
        max_queries = config['clustering']['max_queries']
    
    logger.debug(f"handle_sql_Execute_Full_Pipeline: optimal_k={optimal_k}, max_queries={max_queries}")
    
    # Extract config values
    feature_db = config['databases']['feature_db']
    model_db = config['databases']['model_db']
    model_id = config['model']['model_id']
    
    tables = config['tables']
    embedding_config = config['embedding']
    clustering_config = config['clustering']

    with conn.cursor() as cur:
        
        # Create main SQL query log table
        logger.debug(f"Step 1: Creating main query log table {feature_db}.{tables['sql_query_log_main']}")
        
        try:
            cur.execute(f"DROP TABLE {feature_db}.{tables['sql_query_log_main']}")
            logger.debug(f"Dropped existing table {feature_db}.{tables['sql_query_log_main']}")
        except Exception as e:
            logger.debug(f"DROP failed or table not found: {e}")

        main_query_sql = f"""
        CREATE TABLE {feature_db}.{tables['sql_query_log_main']} AS (
            SELECT 
                CAST(a.QueryID AS BIGINT) AS id,
                a.SQLTextInfo AS txt,
                b.username,
                b.appid,
                b.numsteps,
                b.ampcputime,
                b.TotalIOCount AS logicalio,
                b.wdname,
                CASE WHEN b.ampcputime < HashAmp()+1 OR (b.ampcputime / (HashAmp()+1)) = 0 
                     THEN 0 ELSE b.maxampcputime/(b.ampcputime / (HashAmp()+1)) END (DEC(8,2)) AS CPUSKW,
                CASE WHEN b.ampcputime < HashAmp()+1 OR (b.TotalIOCount / (HashAmp()+1)) = 0 
                     THEN 0 ELSE b.maxampio/(b.TotalIOCount / (HashAmp()+1)) END (DEC(8,2)) AS IOSKW,
                CASE WHEN b.ampcputime < HashAmp()+1 OR b.TotalIOCount = 0 
                     THEN 0 ELSE (b.ampcputime * 1000)/b.TotalIOCount END AS PJI,
                CASE WHEN b.ampcputime < HashAmp()+1 OR b.ampcputime = 0 
                     THEN 0 ELSE b.TotalIOCount/(b.ampcputime * 1000) END AS UII,
                CAST(EXTRACT(HOUR FROM ((b.FirstRespTime - b.StartTime) HOUR(3) TO SECOND(6))) * 3600
                     + EXTRACT(MINUTE FROM ((b.FirstRespTime - b.StartTime) HOUR(3) TO SECOND(6))) * 60
                     + EXTRACT(SECOND FROM ((b.FirstRespTime - b.StartTime) HOUR(3) TO SECOND(6))) AS DECIMAL(10,2)) AS response_secs,
                (CAST(EXTRACT(HOUR FROM ((b.FirstRespTime - b.StartTime) HOUR(3) TO SECOND(6))) * 3600
                     + EXTRACT(MINUTE FROM ((b.FirstRespTime - b.StartTime) HOUR(3) TO SECOND(6))) * 60
                     + EXTRACT(SECOND FROM ((b.FirstRespTime - b.StartTime) HOUR(3) TO SECOND(6))) AS DECIMAL(10,2)))/60.0 AS response_mins,
                CASE WHEN b.delaytime IS NULL THEN 0.0 ELSE b.delaytime END AS delaytime
            FROM DBC.DBQLSqlTbl a 
            JOIN (
                -- OPTIMIZATION: Filter to top queries by CPU BEFORE joining
                SELECT * FROM DBC.DBQLOgTbl 
                WHERE LOWER(statementtype) IN ('select','create table')
                QUALIFY ROW_NUMBER() OVER (ORDER BY ampcputime DESC) <= {max_queries}
            ) b ON a.queryid = b.queryid AND a.procid = b.procid
            WHERE
                a.SQLTextInfo NOT LIKE '%SET QUERY_BAND%' AND
                a.SQLTextInfo NOT LIKE '%ParamValue%' AND
                a.SQLTextInfo NOT LIKE '%SELECT CURRENT_TIMESTAMP%' AND
                LOWER(a.SQLTextInfo) NOT LIKE '%dbc.%' AND 
                a.SqlRowNo = 1
        ) WITH DATA
        """
        
        cur.execute(main_query_sql)
        logger.debug(f"Created main query log table")

        # Create tokenized table for embeddings
        logger.debug(f"Step 2: Creating tokenized table {feature_db}.{tables['sql_log_tokenized_for_embeddings']}")
        
        try:
            cur.execute(f"DROP TABLE {feature_db}.{tables['sql_log_tokenized_for_embeddings']}")
        except Exception as e:
            logger.debug(f"DROP failed or table not found: {e}")

        tokenize_sql = f"""
        CREATE TABLE {feature_db}.{tables['sql_log_tokenized_for_embeddings']} AS (
            SELECT
                id,
                txt,
                IDS AS input_ids,
                attention_mask
            FROM ivsm.tokenizer_encode(
                ON (SELECT * FROM {feature_db}.{tables['sql_query_log_main']})
                ON (SELECT model AS tokenizer FROM {model_db}.{tables['embedding_tokenizers']} 
                    WHERE model_id = '{model_id}') DIMENSION
                USING
                    ColumnsToPreserve('id', 'txt')
                    OutputFields('IDS', 'ATTENTION_MASK')
                    MaxLength({embedding_config['max_length']})
                    PadToMaxLength('{embedding_config['pad_to_max_length']}')
                    TokenDataType('INT64')
            ) AS dt
        ) WITH DATA
        """
        
        cur.execute(tokenize_sql)
        logger.debug(f"Created tokenized table")

        # Create embeddings table
        logger.debug(f"Step 3: Creating embeddings table {feature_db}.{tables['sql_log_embeddings']}")
        
        try:
            cur.execute(f"DROP TABLE {feature_db}.{tables['sql_log_embeddings']}")
        except Exception as e:
            logger.debug(f"DROP failed or table not found: {e}")

        embeddings_sql = f"""
        CREATE TABLE {feature_db}.{tables['sql_log_embeddings']} AS (
            SELECT *
            FROM ivsm.IVSM_score(
                ON {feature_db}.{tables['sql_log_tokenized_for_embeddings']}
                ON (SELECT * FROM {model_db}.{tables['embedding_models']} 
                    WHERE model_id = '{model_id}') DIMENSION
                USING
                    ColumnsToPreserve('id', 'txt')
                    ModelType('ONNX')
                    BinaryInputFields('input_ids', 'attention_mask')
                    BinaryOutputFields('sentence_embedding')
                    Caching('inquery')
            ) a
        ) WITH DATA
        """
        
        cur.execute(embeddings_sql)
        logger.debug(f"Created embeddings table")

        # Create embeddings store table
        logger.debug(f"Step 4: Creating embeddings store table {feature_db}.{tables['sql_log_embeddings_store']}")
        
        try:
            cur.execute(f"DROP TABLE {feature_db}.{tables['sql_log_embeddings_store']}")
        except Exception as e:
            logger.debug(f"DROP failed or table not found: {e}")

        embeddings_store_sql = f"""
        CREATE TABLE {feature_db}.{tables['sql_log_embeddings_store']} AS (
            SELECT *
            FROM ivsm.vector_to_columns(
                ON {feature_db}.{tables['sql_log_embeddings']}
                USING
                    ColumnsToPreserve('id', 'txt')
                    VectorDataType('FLOAT32')
                    VectorLength({embedding_config['vector_length']})
                    OutputColumnPrefix('emb_')
                    InputColumnName('sentence_embedding')
            ) a
        ) WITH DATA
        """
        
        cur.execute(embeddings_store_sql)
        logger.debug(f"Created embeddings store table")

        # Perform K-means clustering
        logger.debug(f"Step 5: Performing K-means clustering with k={optimal_k}")
        
        try:
            cur.execute(f"DROP TABLE {feature_db}.{tables['sql_query_clusters_temp']}")
        except Exception as e:
            logger.debug(f"DROP failed or table not found: {e}")

        kmeans_sql = f"""
        CREATE TABLE {feature_db}.{tables['sql_query_clusters_temp']} AS (
            SELECT td_clusterid_kmeans, a.*
            FROM TD_KMeans (
                ON {feature_db}.{tables['sql_log_embeddings_store']} AS InputTable
                USING
                    IdColumn('id')
                    TargetColumns('[2:385]')
                    NumClusters({optimal_k})
                    Seed({clustering_config['seed']})
                    StopThreshold({clustering_config['stop_threshold']})
                    OutputClusterAssignment('true')
                    MaxIterNum({clustering_config['max_iterations']})
            ) AS dt
            JOIN {feature_db}.{tables['sql_query_log_main']} a ON a.id = dt.id
        ) WITH DATA
        """
        
        cur.execute(kmeans_sql)
        logger.debug(f"Created temporary clusters table")

        # Create final clusters table with silhouette scores
        logger.debug(f"Step 6: Creating final clusters table with silhouette scores")
        
        try:
            cur.execute(f"DROP TABLE {feature_db}.{tables['sql_query_clusters']}")
        except Exception as e:
            logger.debug(f"DROP failed or table not found: {e}")

        final_clusters_sql = f"""
        CREATE TABLE {feature_db}.{tables['sql_query_clusters']} AS (
            SELECT a.*, b.silhouette_score 
            FROM {feature_db}.{tables['sql_query_clusters_temp']} a
            JOIN (SELECT * FROM TD_Silhouette(
                ON (SELECT td_clusterid_kmeans, b.* 
                    FROM {feature_db}.{tables['sql_query_clusters_temp']} a 
                    JOIN {feature_db}.{tables['sql_log_embeddings_store']} b
                    ON a.id = b.id) AS InputTable
                USING
                    IdColumn('id')
                    ClusterIdColumn('td_clusterid_kmeans')
                    TargetColumns('[4:]')
                    OutputType('SAMPLE_SCORES')
            ) AS dt) AS b
            ON a.id = b.id
        ) WITH DATA PRIMARY INDEX(id)
        """
        
        cur.execute(final_clusters_sql)
        logger.debug(f"Created final clusters table")

        # Create cluster statistics table
        logger.debug(f"Step 7: Creating cluster statistics table")
        
        try:
            cur.execute(f"DROP TABLE {feature_db}.{tables['query_cluster_stats']}")
        except Exception as e:
            logger.debug(f"DROP failed or table not found: {e}")

        cluster_stats_sql = f"""
        CREATE TABLE {feature_db}.{tables['query_cluster_stats']} AS (
            SELECT a.td_clusterid_kmeans,
                AVG(a.numsteps) AS avg_numsteps, 
                VAR_SAMP(a.numsteps) AS var_numsteps,
                AVG(a.ampcputime) AS avg_cpu, 
                VAR_SAMP(a.ampcputime) AS var_cpu,
                AVG(a.logicalio) AS avg_io, 
                VAR_SAMP(a.logicalio) AS var_io,
                AVG(a.cpuskw) AS avg_cpuskw, 
                VAR_SAMP(a.cpuskw) AS var_cpuskw,
                AVG(a.ioskw) AS avg_ioskw, 
                VAR_SAMP(a.ioskw) AS var_ioskw,
                AVG(a.pji) AS avg_pji, 
                VAR_SAMP(a.pji) AS var_pji,
                AVG(a.uii) AS avg_uii, 
                VAR_SAMP(a.uii) AS var_uii,
                MAX(un.top_username) AS top_username,
                MAX(top_wdname) AS top_wdname,
                MAX(top_appid) AS top_appid,
                MAX(s1.silhouette_score) AS overall_silhouette_score,
                MAX(s2.silhouette_score) AS cluster_silhouette_score,
                COUNT(*) AS queries
            FROM {feature_db}.{tables['sql_query_clusters']} a 
            JOIN (
                SELECT td_clusterid_kmeans, 
                       username AS top_UserName
                FROM {feature_db}.{tables['sql_query_clusters']}
                GROUP BY td_clusterid_kmeans, username
                QUALIFY ROW_NUMBER() OVER (PARTITION BY td_clusterid_kmeans ORDER BY COUNT(*) DESC) = 1
            ) un ON a.td_clusterid_kmeans = un.td_clusterid_kmeans
            JOIN (
                SELECT td_clusterid_kmeans, 
                       wdname AS top_wdname
                FROM {feature_db}.{tables['sql_query_clusters']}
                GROUP BY td_clusterid_kmeans, wdname
                QUALIFY ROW_NUMBER() OVER (PARTITION BY td_clusterid_kmeans ORDER BY COUNT(*) DESC) = 1
            ) wd ON un.td_clusterid_kmeans = wd.td_clusterid_kmeans
            JOIN (
                SELECT td_clusterid_kmeans, 
                       appid AS top_AppId
                FROM {feature_db}.{tables['sql_query_clusters']}
                GROUP BY td_clusterid_kmeans, appid
                QUALIFY ROW_NUMBER() OVER (PARTITION BY td_clusterid_kmeans ORDER BY COUNT(*) DESC) = 1
            ) ap ON un.td_clusterid_kmeans = ap.td_clusterid_kmeans
            CROSS JOIN (
                SELECT * FROM TD_Silhouette(
                    ON (SELECT td_clusterid_kmeans, b.* 
                        FROM {feature_db}.{tables['sql_query_clusters']} a 
                        JOIN {feature_db}.{tables['sql_log_embeddings_store']} b
                        ON a.id = b.id) AS InputTable
                    USING
                        IdColumn('id')
                        ClusterIdColumn('td_clusterid_kmeans')
                        TargetColumns('[4:]')
                        OutputType('SCORE')
                ) AS dt
            ) AS s1
            JOIN (
                SELECT * FROM TD_Silhouette(
                    ON (SELECT td_clusterid_kmeans, b.* 
                        FROM {feature_db}.{tables['sql_query_clusters']} a 
                        JOIN {feature_db}.{tables['sql_log_embeddings_store']} b
                        ON a.id = b.id) AS InputTable
                    USING
                        IdColumn('id')
                        ClusterIdColumn('td_clusterid_kmeans')
                        TargetColumns('[4:]')
                        OutputType('CLUSTER_SCORES')
                ) AS dt
            ) s2 ON a.td_clusterid_kmeans = s2.td_clusterid_kmeans
            GROUP BY a.td_clusterid_kmeans
        ) WITH DATA PRIMARY INDEX(td_clusterid_kmeans)
        """
        
        cur.execute(cluster_stats_sql)
        logger.debug(f"Created cluster statistics table")

        # Get final results
        cur.execute(f"SELECT COUNT(*) FROM {feature_db}.{tables['sql_query_clusters']}")
        total_queries = cur.fetchone()[0]
        
        cur.execute(f"SELECT COUNT(DISTINCT td_clusterid_kmeans) FROM {feature_db}.{tables['sql_query_clusters']}")
        total_clusters = cur.fetchone()[0]
        
        cur.execute(f"SELECT AVG(silhouette_score) FROM {feature_db}.{tables['sql_query_clusters']}")
        avg_silhouette = cur.fetchone()[0]

    # Return metadata
    metadata = {
        "tool_name": "sql_Execute_Full_Pipeline",
        "workflow_steps": [
            "query_log_extracted", "queries_tokenized", "embeddings_generated", 
            "embeddings_stored", "kmeans_clustering_completed", "silhouette_scores_calculated", 
            "cluster_statistics_generated"
        ],
        "configuration": {
            "optimal_k": optimal_k,
            "max_queries_processed": max_queries,
            "model_id": model_id,
            "clustering_parameters": clustering_config,
            "embedding_parameters": embedding_config
        },
        "results": {
            "total_queries_clustered": total_queries,
            "total_clusters_created": total_clusters,
            "average_silhouette_score": float(avg_silhouette) if avg_silhouette else None
        },
        "tables_created": [
            f"{feature_db}.{tables['sql_query_log_main']}",
            f"{feature_db}.{tables['sql_log_tokenized_for_embeddings']}",
            f"{feature_db}.{tables['sql_log_embeddings']}",
            f"{feature_db}.{tables['sql_log_embeddings_store']}",
            f"{feature_db}.{tables['sql_query_clusters']}",
            f"{feature_db}.{tables['query_cluster_stats']}"
        ],
        "description": "Complete SQL query clustering pipeline executed: extracted SQL logs → tokenized → embedded → clustered → analyzed"
    }

    return create_response({"status": "success", "pipeline_completed": True}, metadata)

src/teradata_mcp_server/app.py:270-282 (registration)

Dynamic registration code that discovers all 'handle_*' functions from loaded tool modules (including handle_sql_Execute_Full_Pipeline from sql_opt_tools.py), extracts the tool name by removing 'handle_' prefix, wraps the function for MCP compatibility (injects DB connection, sets QueryBand, infers schema from signature/docstring), and registers them as MCP tools.

module_loader = td.initialize_module_loader(config)
if module_loader:
    all_functions = module_loader.get_all_functions()
    for name, func in all_functions.items():
        if not (inspect.isfunction(func) and name.startswith("handle_")):
            continue
        tool_name = name[len("handle_"):]
        if not any(re.match(p, tool_name) for p in config.get('tool', [])):
            continue
        wrapped = make_tool_wrapper(func)
        mcp.tool(name=tool_name, description=wrapped.__doc__)(wrapped)
        logger.info(f"Created tool: {tool_name}")
else:

src/teradata_mcp_server/tools/sql_opt/sql_opt_tools.py:57-74 (helper)

Configuration loader and default settings for the SQL clustering pipeline, including database/table names, embedding model (bge-small-en-v1.5), clustering parameters (optimal_k=14, max_queries=10000), and vector dimensions.

def load_sql_clustering_config():
    """Load SQL clustering configuration from sql_opt_config.yml"""
    try:
        # Get the directory path
        current_dir = Path(__file__).parent
        # Go to config/
        config_path = current_dir.parent.parent / 'config' / 'sql_opt_config.yml'
        
        with open(config_path, 'r') as file:
            logger.info(f"Loading SQL clustering config from: {config_path}")
            return yaml.safe_load(file)
    except FileNotFoundError:
        logger.warning(f"SQL clustering config file not found: {config_path}, using defaults")
        return get_default_sql_clustering_config()
    except Exception as e:
        logger.error(f"Error loading SQL clustering config: {e}")
        return get_default_sql_clustering_config()

Tool Definition Quality

A4.4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It does an excellent job describing the multi-step workflow, performance metrics, configuration details, and prerequisites. However, it doesn't explicitly state whether this is a read-only or destructive operation, nor does it mention execution time, resource requirements, or error handling. For a complex pipeline tool, these omissions prevent a perfect score.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

While well-structured with clear sections, the description is quite lengthy with multiple detailed subsections. Some information (like specific metric explanations and configuration details) could potentially be streamlined. The core purpose and workflow are front-loaded, but subsequent sections add substantial bulk that may not all be essential for tool selection.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a complex 2-parameter tool with no annotations and no output schema, the description provides extensive context about the workflow, metrics, configuration, prerequisites, and follow-up steps. However, it doesn't describe what the tool returns (output format/results) or potential limitations/errors. Given the complexity, a complete description would include information about outputs and failure modes.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 0% schema description coverage for the 2 parameters, the description must compensate. It mentions 'optimal_k parameter' and 'top {default_max_queries} queries by CPU time (configurable)' which provides context for what the parameters control. However, it doesn't explicitly name the parameters or provide value ranges/constraints, leaving some ambiguity about how they affect the pipeline.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description explicitly states the tool 'executes the entire SQL query clustering workflow to identify and analyze high CPU usage queries for optimization opportunities.' It provides a specific verb ('executes'), resource ('SQL query clustering workflow'), and distinguishes from siblings like sql_Analyze_Cluster_Stats and sql_Retrieve_Cluster_Queries by emphasizing it's the 'complete pipeline' versus their more focused analysis roles.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description includes explicit guidance on when to use this tool versus alternatives: 'After running this tool, use: 1. sql_Analyze_Cluster_Stats to identify problematic clusters 2. sql_Retrieve_Cluster_Queries to get actual SQL from target clusters.' It also provides clear prerequisites and use cases, making it easy to determine when this tool is appropriate versus other siblings.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Latest Blog Posts

Lightport: Open-Sourcing Glama's AI Gateway
By punkpeye on April 27, 2026.
open source
OpenAI
Tool Definition Quality Score (TDQS)
By punkpeye on April 3, 2026.
mcp
The Hackers Who Tracked My Sleep Cycle
By punkpeye on March 26, 2026.
security

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/blitzstermayank/MCP'

If you have feedback or need assistance with the MCP directory API, please join our Discord server