Skip to main content
Glama
honeycombio
by honeycombio

analyze_columns

Analyze dataset columns to reveal statistical insights, value distributions, and numeric metrics for up to 10 columns simultaneously.

Instructions

Analyzes specific columns in a dataset by running statistical queries and returning computed metrics. This tool allows users to get statistical information about a specific column, including value distribution, top values, and numeric statistics (for numeric columns). Supports analyzing up to 10 columns at once by specifying an array of column names in the 'columns' parameter. When multiple columns are specified, they will be analyzed together as a group, showing the distribution of their combined values. Use this tool before running queries to get a better understanding of the data in your dataset.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
environmentYesThe Honeycomb environment containing the dataset
datasetYesThe dataset containing the column to analyze
columnsYesThe names of the columns to analyze
timeRangeNoTime range in seconds to analyze. Default is 2 hours.

Implementation Reference

  • The main handler function for the 'analyze_columns' tool. It validates inputs, calls the API to perform column analysis, processes the results to compute statistics, top values, cardinality, and returns a formatted JSON response.
    handler: async (params: z.infer<typeof ColumnAnalysisSchema>) => {
      try {
        // Validate required parameters
        if (!params.environment) {
          throw new Error("Missing required parameter: environment");
        }
        if (!params.dataset) {
          throw new Error("Missing required parameter: dataset");
        }
        if (!params.columns || params.columns.length === 0) {
          throw new Error("Missing required parameter: columns");
        }
        if (params.columns.length > 10) {
          throw new Error("Too many columns requested. Maximum is 10.");
        }
        
        // Execute the analysis via the API
        const result = await api.analyzeColumns(params.environment, params.dataset, params);
        
        // Initialize the response
        const simplifiedResponse: SimplifiedColumnAnalysis = {
          columns: params.columns,
          count: result.data?.results?.length || 0,
          totalEvents: 0,  // Will be populated below if available
        };
        
        // Add top values if we have results
        if (result.data?.results && result.data.results.length > 0) {
          const results = result.data.results as QueryResultValue[];
          const firstResult = results[0];
          
          try {
            // Calculate total events across all results
            const totalCount = results.reduce((sum, row) => {
              const count = row.COUNT as number | undefined;
              // Only add if it's a number, otherwise use 0
              return sum + (typeof count === 'number' ? count : 0);
            }, 0);
            simplifiedResponse.totalEvents = totalCount;
            
            // Add top values with their counts and percentages
            simplifiedResponse.topValues = results.map(row => {
              // For multi-column analysis, combine values into a descriptive string
              const combinedValue = params.columns
                .map(col => {
                  const colValue = row[col] !== undefined ? row[col] : null;
                  return `${col}: ${colValue}`;
                })
                .join(', ');
              
              const count = typeof row.COUNT === 'number' ? row.COUNT : 0;
              
              return {
                value: combinedValue,
                count,
                percentage: totalCount > 0 ? 
                  ((count / totalCount) * 100).toFixed(2) + '%' : 
                  '0%'
              };
            });
            
            // Initialize stats container for each numeric column
            const numericStats: Record<string, NumericStatsWithInterpretation> = {};
            
            // Process numeric metrics for each column if available
            if (firstResult) {
              params.columns.forEach(column => {
                // Check if we have numeric metrics for this column
                const avgKey = `AVG(${column})`;
                if (avgKey in firstResult) {
                  const stats: NumericStatistics = {};
                  
                  // Extract metrics for this column
                  if (typeof firstResult[avgKey] === 'number') stats.avg = firstResult[avgKey] as number;
                  if (typeof firstResult[`P95(${column})`] === 'number') stats.p95 = firstResult[`P95(${column})`] as number;
                  if (typeof firstResult[`MAX(${column})`] === 'number') stats.max = firstResult[`MAX(${column})`] as number;
                  if (typeof firstResult[`MIN(${column})`] === 'number') stats.min = firstResult[`MIN(${column})`] as number;
                  
                  // Calculate range if we have min and max
                  if (stats.min !== undefined && stats.max !== undefined) {
                    stats.range = stats.max - stats.min;
                  }
                  
                  // Only add if we have at least one stat
                  if (Object.keys(stats).length > 0) {
                    numericStats[column] = {
                      ...stats,
                      interpretation: generateInterpretation(stats, column)
                    } as NumericStatsWithInterpretation;
                  }
                }
              });
            }
            
            // Add stats if we have any
            if (Object.keys(numericStats).length > 0) {
              simplifiedResponse.stats = numericStats;
            }
            
            // Add cardinality information (unique combinations of values)
            const uniqueValueCombinations = new Set();
            
            results.forEach(row => {
              const combinationKey = params.columns
                .map(col => `${col}:${row[col] !== undefined ? row[col] : 'null'}`)
                .join('|');
              uniqueValueCombinations.add(combinationKey);
            });
            
            const uniqueCount = uniqueValueCombinations.size;
            
            simplifiedResponse.cardinality = {
              uniqueCount,
              classification: getCardinalityClassification(uniqueCount)
            };
          } catch (processingError) {
            // Handle errors during result processing, but still return partial results
            console.error("Error processing column analysis results:", processingError);
            simplifiedResponse.processingError = `Error processing results: ${processingError instanceof Error ? processingError.message : String(processingError)}`;
          }
        }
        
        return {
          content: [
            {
              type: "text",
              text: JSON.stringify(simplifiedResponse, null, 2),
            },
          ],
        };
      } catch (error) {
        return handleToolError(error, "analyze_columns");
      }
    }
  • Zod schema defining the input parameters for the analyze_columns tool: environment, dataset, columns (array 1-10), optional timeRange.
    export const ColumnAnalysisSchema = z.object({
      environment: z.string().min(1).trim().describe("The Honeycomb environment containing the dataset"),
      dataset: z.string().min(1).trim().describe("The dataset containing the column to analyze"),
      columns: z.array(z.string()).min(1).max(10).describe("The names of the columns to analyze"),
      timeRange: z.number().positive().optional().describe("Time range in seconds to analyze. Default is 2 hours."),
    });
  • The registerTools function imports and creates the analyze_columns tool (line 33), adds it to the tools array, and registers it with the MCP server in the loop, including specific validation for analyze_columns.
    export function registerTools(server: McpServer, api: HoneycombAPI) {
      const tools = [
        // Dataset tools
        createListDatasetsTool(api),
        createListColumnsTool(api),
    
        // Query tools
        createRunQueryTool(api),
        createAnalyzeColumnsTool(api),
    
        // Board tools
        createListBoardsTool(api),
        createGetBoardTool(api),
    
        // Marker tools
        createListMarkersTool(api),
    
        // Recipient tools
        createListRecipientsTool(api),
    
        // SLO tools
        createListSLOsTool(api),
        createGetSLOTool(api),
    
        // Trigger tools
        createListTriggersTool(api),
        createGetTriggerTool(api),
        
        // Trace tools
        createTraceDeepLinkTool(api),
        
        // Instrumentation tools
        createInstrumentationGuidanceTool(api)
      ];
    
      // Register each tool with the server
      for (const tool of tools) {
        // Register the tool with the server using type assertion to bypass TypeScript's strict type checking
        (server as any).tool(
          tool.name,
          tool.description,
          tool.schema, 
          async (args: Record<string, any>, extra: any) => {
            try {
              // Validate and ensure required fields are present before passing to handler
              if (tool.name.includes("analyze_columns") && (!args.environment || !args.dataset || !args.columns)) {
                throw new Error("Missing required fields: environment, dataset, and columns are required");
              } else if (tool.name.includes("run_query") && (!args.environment || !args.dataset)) {
                throw new Error("Missing required fields: environment and dataset are required");
              }
              
              // Use type assertion to satisfy TypeScript's type checking
              const result = await tool.handler(args as any);
              
              // If the result already has the expected format, return it directly
              if (result && typeof result === 'object' && 'content' in result) {
                return result as any;
              }
              
              // Otherwise, format the result as expected by the SDK
              return {
                content: [
                  {
                    type: "text",
                    text: typeof result === 'string' ? result : JSON.stringify(result, null, 2),
                  },
                ],
              } as any;
            } catch (error) {
              // Format errors to match the SDK's expected format
              return {
                content: [
                  {
                    type: "text",
                    text: error instanceof Error ? error.message : String(error),
                  },
                ],
                isError: true,
              } as any;
            }
          }
        );
      }
    }
  • HoneycombAPI method called by the tool handler to execute the actual column analysis query on the Honeycomb API, building the AnalysisQuery with breakdowns, calculations, and fetching results.
    async analyzeColumns(
      environment: string,
      datasetSlug: string,
      params: z.infer<typeof ColumnAnalysisSchema>,
    ) {
      // Get column information for each requested column
      const columnPromises = params.columns.map(columnName => 
        this.getColumnByName(environment, datasetSlug, columnName)
      );
      
      const columns = await Promise.all(columnPromises);
      
      const query: AnalysisQuery = {
        calculations: [{ op: "COUNT" }],
        breakdowns: [...params.columns],
        time_range: params.timeRange || 3600,
        limit: 10,
      };
      
      // Only add orders if we have columns
      if (params.columns && params.columns.length > 0) {
        query.orders = [
          {
            column: params.columns[0] as string, // Force type assertion
            order: "descending",
          }
        ];
      }
    
      // Add numeric calculations for any numeric columns
      const numericColumns = columns.filter(
        col => col.type === "integer" || col.type === "float"
      );
      
      numericColumns.forEach(column => {
        const numericCalculations: QueryCalculation[] = [
          { op: "AVG", column: column.key_name },
          { op: "P95", column: column.key_name },
          { op: "MAX", column: column.key_name },
          { op: "MIN", column: column.key_name },
        ];
        
        if (!query.calculations) {
          query.calculations = [];
        }
        query.calculations.push(...numericCalculations);
      });
    
      try {
        const results = await this.queryAndWaitForResults(
          environment,
          datasetSlug,
          query,
        );
        return {
          data: {
            results: results.data?.results || [],
            series: results.data?.series || [],
          },
          links: results.links,
        };
      } catch (error) {
        throw new Error(
          `Column analysis failed: ${error instanceof Error ? error.message : "Unknown error"}`,
        );
      }
    }
  • Factory function that creates the tool object with name, description, schema, and handler for registration.
    export function createAnalyzeColumnsTool(api: HoneycombAPI) {
      return {
        name: "analyze_columns",
        description,
        schema: ColumnAnalysisSchema.shape,
        /**
         * Handles the analyze_column tool request
         * 
         * @param params - The parameters for the column analysis
         * @returns A formatted response with column analysis data
         */
        handler: async (params: z.infer<typeof ColumnAnalysisSchema>) => {
          try {
            // Validate required parameters
            if (!params.environment) {
              throw new Error("Missing required parameter: environment");
            }
            if (!params.dataset) {
              throw new Error("Missing required parameter: dataset");
            }
            if (!params.columns || params.columns.length === 0) {
              throw new Error("Missing required parameter: columns");
            }
            if (params.columns.length > 10) {
              throw new Error("Too many columns requested. Maximum is 10.");
            }
            
            // Execute the analysis via the API
            const result = await api.analyzeColumns(params.environment, params.dataset, params);
            
            // Initialize the response
            const simplifiedResponse: SimplifiedColumnAnalysis = {
              columns: params.columns,
              count: result.data?.results?.length || 0,
              totalEvents: 0,  // Will be populated below if available
            };
            
            // Add top values if we have results
            if (result.data?.results && result.data.results.length > 0) {
              const results = result.data.results as QueryResultValue[];
              const firstResult = results[0];
              
              try {
                // Calculate total events across all results
                const totalCount = results.reduce((sum, row) => {
                  const count = row.COUNT as number | undefined;
                  // Only add if it's a number, otherwise use 0
                  return sum + (typeof count === 'number' ? count : 0);
                }, 0);
                simplifiedResponse.totalEvents = totalCount;
                
                // Add top values with their counts and percentages
                simplifiedResponse.topValues = results.map(row => {
                  // For multi-column analysis, combine values into a descriptive string
                  const combinedValue = params.columns
                    .map(col => {
                      const colValue = row[col] !== undefined ? row[col] : null;
                      return `${col}: ${colValue}`;
                    })
                    .join(', ');
                  
                  const count = typeof row.COUNT === 'number' ? row.COUNT : 0;
                  
                  return {
                    value: combinedValue,
                    count,
                    percentage: totalCount > 0 ? 
                      ((count / totalCount) * 100).toFixed(2) + '%' : 
                      '0%'
                  };
                });
                
                // Initialize stats container for each numeric column
                const numericStats: Record<string, NumericStatsWithInterpretation> = {};
                
                // Process numeric metrics for each column if available
                if (firstResult) {
                  params.columns.forEach(column => {
                    // Check if we have numeric metrics for this column
                    const avgKey = `AVG(${column})`;
                    if (avgKey in firstResult) {
                      const stats: NumericStatistics = {};
                      
                      // Extract metrics for this column
                      if (typeof firstResult[avgKey] === 'number') stats.avg = firstResult[avgKey] as number;
                      if (typeof firstResult[`P95(${column})`] === 'number') stats.p95 = firstResult[`P95(${column})`] as number;
                      if (typeof firstResult[`MAX(${column})`] === 'number') stats.max = firstResult[`MAX(${column})`] as number;
                      if (typeof firstResult[`MIN(${column})`] === 'number') stats.min = firstResult[`MIN(${column})`] as number;
                      
                      // Calculate range if we have min and max
                      if (stats.min !== undefined && stats.max !== undefined) {
                        stats.range = stats.max - stats.min;
                      }
                      
                      // Only add if we have at least one stat
                      if (Object.keys(stats).length > 0) {
                        numericStats[column] = {
                          ...stats,
                          interpretation: generateInterpretation(stats, column)
                        } as NumericStatsWithInterpretation;
                      }
                    }
                  });
                }
                
                // Add stats if we have any
                if (Object.keys(numericStats).length > 0) {
                  simplifiedResponse.stats = numericStats;
                }
                
                // Add cardinality information (unique combinations of values)
                const uniqueValueCombinations = new Set();
                
                results.forEach(row => {
                  const combinationKey = params.columns
                    .map(col => `${col}:${row[col] !== undefined ? row[col] : 'null'}`)
                    .join('|');
                  uniqueValueCombinations.add(combinationKey);
                });
                
                const uniqueCount = uniqueValueCombinations.size;
                
                simplifiedResponse.cardinality = {
                  uniqueCount,
                  classification: getCardinalityClassification(uniqueCount)
                };
              } catch (processingError) {
                // Handle errors during result processing, but still return partial results
                console.error("Error processing column analysis results:", processingError);
                simplifiedResponse.processingError = `Error processing results: ${processingError instanceof Error ? processingError.message : String(processingError)}`;
              }
            }
            
            return {
              content: [
                {
                  type: "text",
                  text: JSON.stringify(simplifiedResponse, null, 2),
                },
              ],
            };
          } catch (error) {
            return handleToolError(error, "analyze_columns");
          }
        }
      };
    }
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It describes what the tool does (statistical analysis, returns metrics) and some behavioral traits (supports up to 10 columns, analyzes them as a group). However, it doesn't mention important aspects like whether this is a read-only operation, potential performance impact, rate limits, or authentication requirements, leaving gaps in behavioral understanding.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is appropriately sized at 5 sentences, front-loaded with the core purpose, and each sentence adds meaningful information. However, the final sentence about usage timing could be integrated more seamlessly, and some phrasing is slightly redundant ('specific columns' then 'specific column').

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's moderate complexity (4 parameters, statistical analysis function) and the absence of both annotations and an output schema, the description provides adequate but incomplete context. It explains what the tool does and some parameter behavior, but doesn't describe the format or structure of the returned metrics, leaving the agent uncertain about the output.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The schema description coverage is 100%, so the schema already documents all parameters thoroughly. The description adds some value by explaining the 'columns' parameter can accept up to 10 columns and that multiple columns are analyzed together as a group, but doesn't provide significant additional semantic context beyond what's in the schema descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose with specific verbs ('analyzes', 'running statistical queries', 'returning computed metrics') and resources ('columns in a dataset'). It distinguishes from siblings like 'list_columns' (which lists columns) and 'run_query' (which runs general queries) by focusing specifically on statistical analysis of column data.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context for when to use the tool ('Use this tool before running queries to get a better understanding of the data in your dataset'). However, it doesn't explicitly state when NOT to use it or mention specific alternatives among the sibling tools, though the purpose differentiation implies alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/honeycombio/honeycomb-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server