analyze_column_distribution
Analyze a column's data distribution to assess quality, detect outliers, and understand patterns for profiling and analytics.
Instructions
Perform advanced statistical analysis of a column's data distribution including nulls, distinct values, percentiles, and outlier detection.
Use this tool when:
User asks "What's the data quality of AMOUNT column?"
Performing data profiling before analytics
Assessing column completeness and distribution
Detecting outliers and data anomalies
Understanding data patterns for ML/AI
What you'll get:
Basic statistics (count, nulls, distinct values, completeness)
Numeric statistics (min, max, mean, percentiles)
Distribution analysis (top values, frequency)
Outlier detection (IQR method)
Data quality assessment
Use cases:
Data quality assessment
Pre-analytics data profiling
Outlier and anomaly detection
Understanding value distributions
ML feature engineering preparation
Data cleansing planning
Example queries:
"Analyze the distribution of SALES_AMOUNT column"
"What's the data quality of CUSTOMER_AGE?"
"Profile the ORDER_STATUS column"
"Detect outliers in PRICE column"
"Show me statistics for QUANTITY field"
Analysis includes:
Null percentage and completeness rate
Distinct value count and cardinality
For numeric columns: min, max, mean, percentiles (p25, p50, p75)
Top value frequencies
Outlier detection using IQR method
Data quality recommendations
Performance notes:
Analyzes up to 10,000 records (configurable)
Default sample size: 1,000 records
Works with numeric, string, and date columns
Automatic type detection and appropriate statistics
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| space_id | Yes | Space ID containing the asset (e.g., 'SAP_CONTENT', 'SALES_ANALYTICS') | |
| asset_name | Yes | Asset (table/view) name containing the column | |
| column_name | Yes | Column name to analyze (e.g., 'SALES_AMOUNT', 'CUSTOMER_AGE', 'ORDER_STATUS') | |
| sample_size | No | Optional: Number of records to analyze (10-10000). Default: 1000. Larger samples = more accurate but slower. | |
| include_outliers | No | Optional: Detect and report outliers using IQR method. Default: true |