Skip to main content
Glama

get_column_statistics

Analyze a single column's statistical properties including data type, null values, and numerical summary for data quality assessment and feature understanding.

Instructions

Get detailed statistical analysis for a single column.

Provides focused statistical analysis for a specific column including data type information, null value handling, and comprehensive numerical statistics when applicable.

Returns: Detailed statistical analysis for the specified column

Column Analysis: šŸ” Data Type: Detected pandas data type šŸ“Š Statistics: Complete statistical summary for numeric columns šŸ”¢ Non-null Count: Number of valid (non-null) values šŸ“ˆ Distribution: Statistical distribution characteristics

Examples: # Analyze a price column stats = await get_column_statistics(ctx, "price")

# Analyze a categorical column stats = await get_column_statistics(ctx, "category")

AI Workflow Integration: 1. Deep dive analysis for specific columns of interest 2. Data quality assessment for individual features 3. Understanding column characteristics for modeling 4. Validation of data transformations

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
columnYesName of the column to analyze in detail

Implementation Reference

  • The handler function that executes the get_column_statistics tool. It retrieves the dataframe from the session context, validates the column exists, computes comprehensive statistics (mean, std, percentiles for numeric; top value for categorical), determines data type, and returns a structured ColumnStatisticsResult.
    async def get_column_statistics( ctx: Annotated[Context, Field(description="FastMCP context for session access")], column: Annotated[str, Field(description="Name of the column to analyze in detail")], ) -> ColumnStatisticsResult: """Get detailed statistical analysis for a single column. Provides focused statistical analysis for a specific column including data type information, null value handling, and comprehensive numerical statistics when applicable. Returns: Detailed statistical analysis for the specified column Column Analysis: šŸ” Data Type: Detected pandas data type šŸ“Š Statistics: Complete statistical summary for numeric columns šŸ”¢ Non-null Count: Number of valid (non-null) values šŸ“ˆ Distribution: Statistical distribution characteristics Examples: # Analyze a price column stats = await get_column_statistics(ctx, "price") # Analyze a categorical column stats = await get_column_statistics(ctx, "category") AI Workflow Integration: 1. Deep dive analysis for specific columns of interest 2. Data quality assessment for individual features 3. Understanding column characteristics for modeling 4. Validation of data transformations """ # Get session_id from FastMCP context session_id = ctx.session_id _session, df = get_session_data(session_id) # Only need df, not session if column not in df.columns: raise ColumnNotFoundError(column, df.columns.tolist()) col_data = df[column] dtype = str(col_data.dtype) count = int(col_data.count()) unique_count = int(col_data.nunique()) # Helper function to safely convert pandas scalars to float def safe_float(value: Any) -> float: """Safely convert pandas scalar to float.""" try: return float(value) if not pd.isna(value) else 0.0 except (TypeError, ValueError): return 0.0 # No longer recording operations (simplified MCP architecture) # Build StatisticsSummary directly if pd.api.types.is_numeric_dtype(col_data) and not pd.api.types.is_bool_dtype(col_data): # Numeric columns - calculate all statistics col_data_non_null = col_data.dropna() percentile_25 = ( float(col_data_non_null.quantile(0.25)) if len(col_data_non_null) > 0 else None ) percentile_50 = ( float(col_data_non_null.quantile(0.50)) if len(col_data_non_null) > 0 else None ) percentile_75 = ( float(col_data_non_null.quantile(0.75)) if len(col_data_non_null) > 0 else None ) stats_summary = StatisticsSummary( count=count, mean=safe_float(col_data.mean()), std=safe_float(col_data.std()), min=safe_float(col_data.min()), percentile_25=percentile_25, percentile_50=percentile_50, percentile_75=percentile_75, max=safe_float(col_data.max()), unique=unique_count, ) else: # For non-numeric columns, populate categorical statistics # Calculate most frequent value for categorical columns most_frequent_val: str | None = None most_frequent_count: int | None = None if count > 0: mode_result = col_data.mode() if len(mode_result) > 0: mode_val = mode_result.iloc[0] if mode_val is not None and not pd.isna(mode_val): most_frequent_val = str(mode_val) most_frequent_count = int(col_data.value_counts().iloc[0]) stats_summary = StatisticsSummary( count=count, mean=None, std=None, min=None, percentile_25=None, percentile_50=None, percentile_75=None, max=None, unique=unique_count, top=most_frequent_val, freq=most_frequent_count, ) # Map dtype to expected literal type dtype_map: dict[ str, Literal["int64", "float64", "object", "bool", "datetime64", "category"], ] = { "int64": "int64", "float64": "float64", "object": "object", "bool": "bool", "datetime64[ns]": "datetime64", "category": "category", } data_type: Literal["int64", "float64", "object", "bool", "datetime64", "category"] = ( dtype_map.get(dtype, "object") ) return ColumnStatisticsResult( column=column, statistics=stats_summary, data_type=data_type, non_null_count=count, )
  • Registers the get_column_statistics handler as a FastMCP tool with the exact name 'get_column_statistics' on the statistics_server instance.
    statistics_server.tool(name="get_column_statistics")(get_column_statistics)
  • Pydantic model defining the output schema/response structure for the get_column_statistics tool, including column details, statistics summary, data type, and non-null count.
    class ColumnStatisticsResult(BaseToolResponse): """Response model for individual column statistical analysis.""" column: str = Field(description="Name of the analyzed column") statistics: StatisticsSummary = Field(description="Statistical summary for the column") data_type: Literal["int64", "float64", "object", "bool", "datetime64", "category"] = Field( description="Pandas data type of the column", ) non_null_count: int = Field(description="Number of non-null values in the column")
  • Nested Pydantic model used within ColumnStatisticsResult for detailed statistical summary, supporting both numeric (mean, std, percentiles) and categorical (unique, top, freq) statistics.
    class StatisticsSummary(BaseModel): """Statistical summary for a single column.""" model_config = ConfigDict(populate_by_name=True) count: int = Field(description="Total number of non-null values") mean: float | None = Field(default=None, description="Arithmetic mean (numeric columns only)") std: float | None = Field(default=None, description="Standard deviation (numeric columns only)") min: float | str | None = Field(default=None, description="Minimum value in the column") percentile_25: float | None = Field( default=None, alias="25%", description="25th percentile value (numeric columns only)", ) percentile_50: float | None = Field( default=None, alias="50%", description="50th percentile/median value (numeric columns only)", ) percentile_75: float | None = Field( default=None, alias="75%", description="75th percentile value (numeric columns only)", ) max: float | str | None = Field(default=None, description="Maximum value in the column") # Categorical statistics fields unique: int | None = Field( None, description="Number of unique values (categorical columns only)", ) top: str | None = Field( None, description="Most frequently occurring value (categorical columns only)", ) freq: int | None = Field( None, description="Frequency of the most common value (categorical columns only)", )
  • Helper function defined within the handler to safely convert pandas scalar values to float, handling NaN and type errors.
    def safe_float(value: Any) -> float: """Safely convert pandas scalar to float.""" try: return float(value) if not pd.isna(value) else 0.0 except (TypeError, ValueError): return 0.0

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/jonpspri/databeak'

If you have feedback or need assistance with the MCP directory API, please join our Discord server