Skip to main content
Glama

get_statistics

Analyze numerical data columns to compute descriptive statistics including count, mean, standard deviation, min/max values, and percentiles for data quality assessment and distribution analysis.

Instructions

Get comprehensive statistical summary of numerical columns.

Computes descriptive statistics for all or specified numerical columns including count, mean, standard deviation, min/max values, and percentiles. Optimized for AI workflows with clear statistical insights and data understanding.

Returns: Comprehensive statistical analysis with per-column summaries

Statistical Metrics: ๐Ÿ“Š Count: Number of non-null values ๐Ÿ“ˆ Mean: Average value ๐Ÿ“‰ Std: Standard deviation (measure of spread) ๐Ÿ”ข Min/Max: Minimum and maximum values ๐Ÿ“Š Percentiles: 25th, 50th (median), 75th quartiles

Examples: # Get statistics for all numeric columns stats = await get_statistics("session_123")

# Analyze specific columns only stats = await get_statistics("session_123", columns=["price", "quantity"]) # Analyze all numeric columns (percentiles always included) stats = await get_statistics("session_123")

AI Workflow Integration: 1. Essential for data understanding and quality assessment 2. Identifies data distribution and potential issues 3. Guides feature engineering and analysis decisions 4. Provides context for outlier detection thresholds

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
columnsYesList of specific columns to analyze (None = all numeric columns)

Implementation Reference

  • The main handler function that executes the get_statistics tool logic: loads session data, selects numeric columns, computes descriptive statistics (count, mean, std, min, max, quartiles), handles edge cases like empty data or missing columns, and returns structured StatisticsResult.
    async def get_statistics( ctx: Annotated[Context, Field(description="FastMCP context for session access")], *, columns: Annotated[ list[str] | None, Field(description="List of specific columns to analyze (None = all numeric columns)"), ] = None, ) -> StatisticsResult: """Get comprehensive statistical summary of numerical columns. Computes descriptive statistics for all or specified numerical columns including count, mean, standard deviation, min/max values, and percentiles. Optimized for AI workflows with clear statistical insights and data understanding. Returns: Comprehensive statistical analysis with per-column summaries Statistical Metrics: ๐Ÿ“Š Count: Number of non-null values ๐Ÿ“ˆ Mean: Average value ๐Ÿ“‰ Std: Standard deviation (measure of spread) ๐Ÿ”ข Min/Max: Minimum and maximum values ๐Ÿ“Š Percentiles: 25th, 50th (median), 75th quartiles Examples: # Get statistics for all numeric columns stats = await get_statistics("session_123") # Analyze specific columns only stats = await get_statistics("session_123", columns=["price", "quantity"]) # Analyze all numeric columns (percentiles always included) stats = await get_statistics("session_123") AI Workflow Integration: 1. Essential for data understanding and quality assessment 2. Identifies data distribution and potential issues 3. Guides feature engineering and analysis decisions 4. Provides context for outlier detection thresholds """ # Get session_id from FastMCP context session_id = ctx.session_id _session, df = get_session_data(session_id) # Only need df, not session # Select numeric columns if columns: missing_cols = [col for col in columns if col not in df.columns] if missing_cols: raise ColumnNotFoundError(missing_cols[0], df.columns.tolist()) numeric_df = df[columns].select_dtypes(include=[np.number]) # Return empty results if no numeric columns found when specific columns requested if numeric_df.empty: return StatisticsResult( statistics={}, column_count=0, numeric_columns=[], total_rows=len(df), ) else: numeric_df = df.select_dtypes(include=[np.number]) # Return empty results if no numeric columns if numeric_df.empty: return StatisticsResult( statistics={}, column_count=0, numeric_columns=[], total_rows=len(df), ) # Calculate statistics stats_dict = {} for col in numeric_df.columns: col_data = numeric_df[col].dropna() # Create StatisticsSummary directly # Calculate statistics, using 0.0 for undefined values col_stats = StatisticsSummary.model_validate( { "count": int(col_data.count()), "mean": float(col_data.mean()) if len(col_data) > 0 and not pd.isna(col_data.mean()) else 0.0, "std": float(col_data.std()) if len(col_data) > 1 and not pd.isna(col_data.std()) else 0.0, "min": float(col_data.min()) if len(col_data) > 0 and not pd.isna(col_data.min()) else 0.0, "max": float(col_data.max()) if len(col_data) > 0 and not pd.isna(col_data.max()) else 0.0, "25%": float(col_data.quantile(0.25)) if len(col_data) > 0 else 0.0, "50%": float(col_data.quantile(0.50)) if len(col_data) > 0 else 0.0, "75%": float(col_data.quantile(0.75)) if len(col_data) > 0 else 0.0, }, ) stats_dict[col] = col_stats # No longer recording operations (simplified MCP architecture) return StatisticsResult( statistics=stats_dict, column_count=len(stats_dict), numeric_columns=list(stats_dict.keys()), total_rows=len(df), )
  • Pydantic models defining the input/output schema for get_statistics: StatisticsSummary for individual column statistics and StatisticsResult for the overall response containing statistics for multiple columns.
    class StatisticsSummary(BaseModel): """Statistical summary for a single column.""" model_config = ConfigDict(populate_by_name=True) count: int = Field(description="Total number of non-null values") mean: float | None = Field(default=None, description="Arithmetic mean (numeric columns only)") std: float | None = Field(default=None, description="Standard deviation (numeric columns only)") min: float | str | None = Field(default=None, description="Minimum value in the column") percentile_25: float | None = Field( default=None, alias="25%", description="25th percentile value (numeric columns only)", ) percentile_50: float | None = Field( default=None, alias="50%", description="50th percentile/median value (numeric columns only)", ) percentile_75: float | None = Field( default=None, alias="75%", description="75th percentile value (numeric columns only)", ) max: float | str | None = Field(default=None, description="Maximum value in the column") # Categorical statistics fields unique: int | None = Field( None, description="Number of unique values (categorical columns only)", ) top: str | None = Field( None, description="Most frequently occurring value (categorical columns only)", ) freq: int | None = Field( None, description="Frequency of the most common value (categorical columns only)", ) class StatisticsResult(BaseToolResponse): """Response model for dataset statistical analysis.""" statistics: dict[str, StatisticsSummary] = Field( description="Statistical summary for each column", ) column_count: int = Field(description="Total number of columns analyzed") numeric_columns: list[str] = Field(description="Names of numeric columns that were analyzed") total_rows: int = Field(description="Total number of rows in the dataset")
  • Creates the FastMCP statistics_server instance and registers get_statistics (along with related tools) as MCP tools.
    # Create Statistics server statistics_server = FastMCP( "DataBeak-Statistics", instructions="Statistics and correlation analysis server for DataBeak with comprehensive numerical analysis capabilities", ) # Register the statistical analysis functions directly as MCP tools statistics_server.tool(name="get_statistics")(get_statistics) statistics_server.tool(name="get_column_statistics")(get_column_statistics) statistics_server.tool(name="get_correlation_matrix")(get_correlation_matrix) statistics_server.tool(name="get_value_counts")(get_value_counts)

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/jonpspri/databeak'

If you have feedback or need assistance with the MCP directory API, please join our Discord server