Skip to main content
Glama

get_column_stats

Analyze a specific column in a data file to generate statistical insights in JSON format for data analysis and visualization.

Instructions

Get statistics for a specific column.

Args: file_path: Path to the data file column: Column name to analyze

Returns: Column statistics in JSON format

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
file_pathYes
columnYes

Implementation Reference

  • Main implementation of get_column_stats tool. Loads data from file using pandas, calculates comprehensive statistics for the specified column including type, null counts, unique values, sample data. For numeric columns: min, max, mean, median, std, and quartiles. For non-numeric: most common values with counts and percentages.
    @mcp.tool()
    def get_column_stats(file_path: str, column: str) -> str:
        """
        Get statistics for a specific column.
        
        Args:
            file_path: Path to the data file
            column: Column name to analyze
        
        Returns:
            Column statistics in JSON format
        """
        try:
            import pandas as pd
            from pathlib import Path
            
            file_extension = Path(file_path).suffix.lower()
            
            # Load with pandas
            if file_extension == '.csv':
                df = pd.read_csv(file_path)
            elif file_extension == '.json':
                df = pd.read_json(file_path)
            elif file_extension in ['.xlsx', '.xls']:
                df = pd.read_excel(file_path)
            elif file_extension == '.tsv':
                df = pd.read_csv(file_path, sep='\t')
            else:
                df = pd.read_csv(file_path)
            
            if column not in df.columns:
                return f"Error: Column '{column}' not found. Available columns: {list(df.columns)}"
            
            col_data = df[column]
            
            stats = {
                "column": column,
                "type": str(col_data.dtype),
                "total_values": len(col_data),
                "non_null_values": int(col_data.notna().sum()),
                "null_count": int(col_data.isna().sum()),
                "unique_values": int(col_data.nunique())
            }
            
            # Sample values
            sample_values = []
            for value in col_data.dropna().head(10):
                if hasattr(value, 'item'):  # numpy types
                    sample_values.append(value.item())
                else:
                    sample_values.append(str(value))
            stats["sample_values"] = sample_values
            
            # If numeric, calculate additional stats
            if pd.api.types.is_numeric_dtype(col_data):
                numeric_col = col_data.dropna()
                if not numeric_col.empty:
                    stats["min"] = float(numeric_col.min())
                    stats["max"] = float(numeric_col.max())
                    stats["mean"] = float(numeric_col.mean())
                    stats["median"] = float(numeric_col.median())
                    stats["std"] = float(numeric_col.std())
                    stats["quartiles"] = {
                        "25%": float(numeric_col.quantile(0.25)),
                        "50%": float(numeric_col.quantile(0.50)),
                        "75%": float(numeric_col.quantile(0.75))
                    }
            else:
                # For non-numeric, show value counts
                value_counts = col_data.value_counts().head(10)
                stats["most_common"] = []
                for value, count in value_counts.items():
                    stats["most_common"].append({
                        "value": str(value),
                        "count": int(count),
                        "percentage": round(count / len(col_data) * 100, 2)
                    })
            
            return json.dumps(stats, indent=2)
            
        except Exception as e:
            return f"Error getting column stats: {str(e)}\n{traceback.format_exc()}"
  • Tool registration via @mcp.tool() decorator that registers get_column_stats as an MCP tool with parameters file_path and column.
    @mcp.tool()
    def get_column_stats(file_path: str, column: str) -> str:
        """
        Get statistics for a specific column.
        
        Args:
            file_path: Path to the data file
            column: Column name to analyze
        
        Returns:
            Column statistics in JSON format
        """
  • Import of get_column_stats function in demo example.
    get_column_stats,
  • Usage example showing how to call get_column_stats with file path and column name, then parse and display the returned statistics.
    result = get_column_stats(str(sample_file), "salary")
    stats = json.loads(result)
    print(f"Statistics for '{stats['column']}' column:")
    print(f"- Type: {stats['type']}")
    print(f"- Values: {stats['total_values']}")
    print(f"- Null count: {stats['null_count']}")
    if 'min' in stats:
        print(f"- Range: ${stats['min']:,} - ${stats['max']:,}")
        print(f"- Average: ${stats['mean']:,.2f}")
    print()

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/moeloubani/visidata-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server