get_value_counts
Analyze frequency distribution of values in a column to understand categorical data patterns, identify common values, and assess data quality with configurable counts or percentages.
Instructions
Get frequency distribution of values in a column.
Analyzes the distribution of values in a specified column, providing counts and optionally percentages for each unique value. Essential for understanding categorical data and identifying common patterns.
Returns: Frequency distribution with counts/percentages for each unique value
Analysis Features: š¢ Frequency Counts: Raw counts for each unique value š Percentage Mode: Normalized frequencies as percentages šÆ Top Values: Configurable limit for most frequent values š Summary Stats: Total values, unique count, distribution insights
Examples: # Basic value counts counts = await get_value_counts(ctx, "category")
AI Workflow Integration: 1. Categorical data analysis and encoding decisions 2. Data quality assessment (identifying rare values) 3. Understanding distribution for sampling strategies 4. Feature engineering insights for categorical variables
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| column | Yes | Name of the column to analyze value distribution | |
| normalize | Yes | Return percentages instead of raw counts | |
| sort | Yes | Sort results by frequency | |
| ascending | Yes | Sort in ascending order (False = descending) | |
| top_n | Yes | Maximum number of values to return (None = all values) |
Implementation Reference
- The async handler function that implements the core logic for the get_value_counts tool. It retrieves the session dataframe, computes value counts using pandas.value_counts with options for normalization, sorting, and limiting results, handles various data types and NaN values, and returns a structured ValueCountsResult.async def get_value_counts( ctx: Annotated[Context, Field(description="FastMCP context for session access")], column: Annotated[str, Field(description="Name of the column to analyze value distribution")], *, normalize: Annotated[ bool, Field(description="Return percentages instead of raw counts"), ] = False, sort: Annotated[bool, Field(description="Sort results by frequency")] = True, ascending: Annotated[ bool, Field(description="Sort in ascending order (False = descending)"), ] = False, top_n: Annotated[ int | None, Field(description="Maximum number of values to return (None = all values)"), ] = None, ) -> ValueCountsResult: """Get frequency distribution of values in a column. Analyzes the distribution of values in a specified column, providing counts and optionally percentages for each unique value. Essential for understanding categorical data and identifying common patterns. Returns: Frequency distribution with counts/percentages for each unique value Analysis Features: š¢ Frequency Counts: Raw counts for each unique value š Percentage Mode: Normalized frequencies as percentages šÆ Top Values: Configurable limit for most frequent values š Summary Stats: Total values, unique count, distribution insights Examples: # Basic value counts counts = await get_value_counts(ctx, "category") # Get percentages for top 10 values counts = await get_value_counts(ctx, "status", normalize=True, top_n=10) # Sort in ascending order counts = await get_value_counts(ctx, "grade", ascending=True) AI Workflow Integration: 1. Categorical data analysis and encoding decisions 2. Data quality assessment (identifying rare values) 3. Understanding distribution for sampling strategies 4. Feature engineering insights for categorical variables """ # Get session_id from FastMCP context session_id = ctx.session_id _session, df = get_session_data(session_id) # Only need df, not session if column not in df.columns: raise ColumnNotFoundError(column, df.columns.tolist()) # Get value counts # Note: mypy has issues with value_counts overloads when normalize is a bool variable value_counts = df[column].value_counts( normalize=normalize, sort=sort, ascending=ascending, dropna=True, ) # type: ignore[call-overload] # Limit to top_n if specified if top_n is not None and top_n > 0: value_counts = value_counts.head(top_n) # Convert to dict, handling various data types counts_dict = {} for value, count in value_counts.items(): # Handle NaN and None values if pd.isna(value): key = "<null>" elif isinstance(value, str | int | float | bool): key = str(value) else: key = str(value) counts_dict[key] = float(count) if normalize else int(count) # Calculate summary statistics total_count = int(df[column].count()) # Non-null count unique_count = int(df[column].nunique()) # No longer recording operations (simplified MCP architecture) return ValueCountsResult( column=column, value_counts=counts_dict, total_values=total_count, unique_values=unique_count, normalize=normalize, )
- Pydantic model defining the structured output schema for the get_value_counts tool response, including value counts dictionary, summary statistics, and normalization flag.class ValueCountsResult(BaseToolResponse): """Response model for value frequency analysis.""" column: str = Field(description="Name of the analyzed column") value_counts: dict[str, int | float] = Field( description="Count or proportion of each unique value", ) total_values: int = Field(description="Total number of values (including duplicates)") unique_values: int = Field(description="Number of unique/distinct values") normalize: bool = Field( default=False, description="Whether counts are normalized as proportions", )
- src/databeak/servers/statistics_server.py:513-513 (registration)Registers the get_value_counts handler function as an MCP tool named 'get_value_counts' on the FastMCP statistics_server instance.statistics_server.tool(name="get_value_counts")(get_value_counts)