Skip to main content
Glama

get_statistics

Compute descriptive statistics for numerical columns including count, mean, standard deviation, min/max values, and percentiles to analyze data distribution and quality.

Instructions

Get comprehensive statistical summary of numerical columns.

Computes descriptive statistics for all or specified numerical columns including count, mean, standard deviation, min/max values, and percentiles. Optimized for AI workflows with clear statistical insights and data understanding.

Returns: Comprehensive statistical analysis with per-column summaries

Statistical Metrics: šŸ“Š Count: Number of non-null values šŸ“ˆ Mean: Average value šŸ“‰ Std: Standard deviation (measure of spread) šŸ”¢ Min/Max: Minimum and maximum values šŸ“Š Percentiles: 25th, 50th (median), 75th quartiles

Examples: # Get statistics for all numeric columns stats = await get_statistics("session_123")

# Analyze specific columns only stats = await get_statistics("session_123", columns=["price", "quantity"]) # Analyze all numeric columns (percentiles always included) stats = await get_statistics("session_123")

AI Workflow Integration: 1. Essential for data understanding and quality assessment 2. Identifies data distribution and potential issues 3. Guides feature engineering and analysis decisions 4. Provides context for outlier detection thresholds

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
columnsYesList of specific columns to analyze (None = all numeric columns)

Implementation Reference

  • The core handler function that implements the get_statistics tool logic. It retrieves the session data, selects numeric columns, computes descriptive statistics (count, mean, std, min, max, quartiles), and returns a StatisticsResult object.
    async def get_statistics( ctx: Annotated[Context, Field(description="FastMCP context for session access")], *, columns: Annotated[ list[str] | None, Field(description="List of specific columns to analyze (None = all numeric columns)"), ] = None, ) -> StatisticsResult: """Get comprehensive statistical summary of numerical columns. Computes descriptive statistics for all or specified numerical columns including count, mean, standard deviation, min/max values, and percentiles. Optimized for AI workflows with clear statistical insights and data understanding. Returns: Comprehensive statistical analysis with per-column summaries Statistical Metrics: šŸ“Š Count: Number of non-null values šŸ“ˆ Mean: Average value šŸ“‰ Std: Standard deviation (measure of spread) šŸ”¢ Min/Max: Minimum and maximum values šŸ“Š Percentiles: 25th, 50th (median), 75th quartiles Examples: # Get statistics for all numeric columns stats = await get_statistics("session_123") # Analyze specific columns only stats = await get_statistics("session_123", columns=["price", "quantity"]) # Analyze all numeric columns (percentiles always included) stats = await get_statistics("session_123") AI Workflow Integration: 1. Essential for data understanding and quality assessment 2. Identifies data distribution and potential issues 3. Guides feature engineering and analysis decisions 4. Provides context for outlier detection thresholds """ # Get session_id from FastMCP context session_id = ctx.session_id _session, df = get_session_data(session_id) # Only need df, not session # Select numeric columns if columns: missing_cols = [col for col in columns if col not in df.columns] if missing_cols: raise ColumnNotFoundError(missing_cols[0], df.columns.tolist()) numeric_df = df[columns].select_dtypes(include=[np.number]) # Return empty results if no numeric columns found when specific columns requested if numeric_df.empty: return StatisticsResult( statistics={}, column_count=0, numeric_columns=[], total_rows=len(df), ) else: numeric_df = df.select_dtypes(include=[np.number]) # Return empty results if no numeric columns if numeric_df.empty: return StatisticsResult( statistics={}, column_count=0, numeric_columns=[], total_rows=len(df), ) # Calculate statistics stats_dict = {} for col in numeric_df.columns: col_data = numeric_df[col].dropna() # Create StatisticsSummary directly # Calculate statistics, using 0.0 for undefined values col_stats = StatisticsSummary.model_validate( { "count": int(col_data.count()), "mean": float(col_data.mean()) if len(col_data) > 0 and not pd.isna(col_data.mean()) else 0.0, "std": float(col_data.std()) if len(col_data) > 1 and not pd.isna(col_data.std()) else 0.0, "min": float(col_data.min()) if len(col_data) > 0 and not pd.isna(col_data.min()) else 0.0, "max": float(col_data.max()) if len(col_data) > 0 and not pd.isna(col_data.max()) else 0.0, "25%": float(col_data.quantile(0.25)) if len(col_data) > 0 else 0.0, "50%": float(col_data.quantile(0.50)) if len(col_data) > 0 else 0.0, "75%": float(col_data.quantile(0.75)) if len(col_data) > 0 else 0.0, }, ) stats_dict[col] = col_stats # No longer recording operations (simplified MCP architecture) return StatisticsResult( statistics=stats_dict, column_count=len(stats_dict), numeric_columns=list(stats_dict.keys()), total_rows=len(df), )
  • Pydantic model defining the output schema for the get_statistics tool response, including per-column statistics and dataset metadata.
    class StatisticsResult(BaseToolResponse): """Response model for dataset statistical analysis.""" statistics: dict[str, StatisticsSummary] = Field( description="Statistical summary for each column", ) column_count: int = Field(description="Total number of columns analyzed") numeric_columns: list[str] = Field(description="Names of numeric columns that were analyzed") total_rows: int = Field(description="Total number of rows in the dataset")
  • Pydantic model used within StatisticsResult for individual column statistical summaries, supporting both numeric and categorical statistics.
    class StatisticsSummary(BaseModel): """Statistical summary for a single column.""" model_config = ConfigDict(populate_by_name=True) count: int = Field(description="Total number of non-null values") mean: float | None = Field(default=None, description="Arithmetic mean (numeric columns only)") std: float | None = Field(default=None, description="Standard deviation (numeric columns only)") min: float | str | None = Field(default=None, description="Minimum value in the column") percentile_25: float | None = Field( default=None, alias="25%", description="25th percentile value (numeric columns only)", ) percentile_50: float | None = Field( default=None, alias="50%", description="50th percentile/median value (numeric columns only)", ) percentile_75: float | None = Field( default=None, alias="75%", description="75th percentile value (numeric columns only)", ) max: float | str | None = Field(default=None, description="Maximum value in the column") # Categorical statistics fields unique: int | None = Field( None, description="Number of unique values (categorical columns only)", ) top: str | None = Field( None, description="Most frequently occurring value (categorical columns only)", ) freq: int | None = Field( None, description="Frequency of the most common value (categorical columns only)", )
  • Registration of the get_statistics handler as an MCP tool on the statistics_server FastMCP instance.
    statistics_server.tool(name="get_statistics")(get_statistics)

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/jonpspri/databeak'

If you have feedback or need assistance with the MCP directory API, please join our Discord server