get_value_counts

Analyze frequency distribution of values in a column to understand categorical data patterns, identify common values, and assess data quality with configurable counts or percentages.

Instructions

Get frequency distribution of values in a column.

Analyzes the distribution of values in a specified column, providing counts and optionally percentages for each unique value. Essential for understanding categorical data and identifying common patterns.

Returns: Frequency distribution with counts/percentages for each unique value

Analysis Features: 🔢 Frequency Counts: Raw counts for each unique value 📊 Percentage Mode: Normalized frequencies as percentages 🎯 Top Values: Configurable limit for most frequent values 📈 Summary Stats: Total values, unique count, distribution insights

Examples: # Basic value counts counts = await get_value_counts(ctx, "category")

# Get percentages for top 10 values counts = await get_value_counts(ctx, "status", normalize=True, top_n=10) # Sort in ascending order counts = await get_value_counts(ctx, "grade", ascending=True)

AI Workflow Integration: 1. Categorical data analysis and encoding decisions 2. Data quality assessment (identifying rare values) 3. Understanding distribution for sampling strategies 4. Feature engineering insights for categorical variables

Input Schema

TableJSON Schema

Name	Required	Description
`column`	Yes	Name of the column to analyze value distribution
`normalize`	Yes	Return percentages instead of raw counts
`sort`	Yes	Sort results by frequency
`ascending`	Yes	Sort in ascending order (False = descending)
`top_n`	Yes	Maximum number of values to return (None = all values)

Implementation Reference

src/databeak/servers/statistics_server.py:398-495 (handler)
The async handler function that implements the core logic for the get_value_counts tool. It retrieves the session dataframe, computes value counts using pandas.value_counts with options for normalization, sorting, and limiting results, handles various data types and NaN values, and returns a structured ValueCountsResult.
async def get_value_counts( ctx: Annotated[Context, Field(description="FastMCP context for session access")], column: Annotated[str, Field(description="Name of the column to analyze value distribution")], *, normalize: Annotated[ bool, Field(description="Return percentages instead of raw counts"), ] = False, sort: Annotated[bool, Field(description="Sort results by frequency")] = True, ascending: Annotated[ bool, Field(description="Sort in ascending order (False = descending)"), ] = False, top_n: Annotated[ int | None, Field(description="Maximum number of values to return (None = all values)"), ] = None, ) -> ValueCountsResult: """Get frequency distribution of values in a column. Analyzes the distribution of values in a specified column, providing counts and optionally percentages for each unique value. Essential for understanding categorical data and identifying common patterns. Returns: Frequency distribution with counts/percentages for each unique value Analysis Features: 🔢 Frequency Counts: Raw counts for each unique value 📊 Percentage Mode: Normalized frequencies as percentages 🎯 Top Values: Configurable limit for most frequent values 📈 Summary Stats: Total values, unique count, distribution insights Examples: # Basic value counts counts = await get_value_counts(ctx, "category") # Get percentages for top 10 values counts = await get_value_counts(ctx, "status", normalize=True, top_n=10) # Sort in ascending order counts = await get_value_counts(ctx, "grade", ascending=True) AI Workflow Integration: 1. Categorical data analysis and encoding decisions 2. Data quality assessment (identifying rare values) 3. Understanding distribution for sampling strategies 4. Feature engineering insights for categorical variables """ # Get session_id from FastMCP context session_id = ctx.session_id _session, df = get_session_data(session_id) # Only need df, not session if column not in df.columns: raise ColumnNotFoundError(column, df.columns.tolist()) # Get value counts # Note: mypy has issues with value_counts overloads when normalize is a bool variable value_counts = df[column].value_counts( normalize=normalize, sort=sort, ascending=ascending, dropna=True, ) # type: ignore[call-overload] # Limit to top_n if specified if top_n is not None and top_n > 0: value_counts = value_counts.head(top_n) # Convert to dict, handling various data types counts_dict = {} for value, count in value_counts.items(): # Handle NaN and None values if pd.isna(value): key = "<null>" elif isinstance(value, str | int | float | bool): key = str(value) else: key = str(value) counts_dict[key] = float(count) if normalize else int(count) # Calculate summary statistics total_count = int(df[column].count()) # Non-null count unique_count = int(df[column].nunique()) # No longer recording operations (simplified MCP architecture) return ValueCountsResult( column=column, value_counts=counts_dict, total_values=total_count, unique_values=unique_count, normalize=normalize, )
src/databeak/models/statistics_models.py:91-104 (schema)
Pydantic model defining the structured output schema for the get_value_counts tool response, including value counts dictionary, summary statistics, and normalization flag.
class ValueCountsResult(BaseToolResponse): """Response model for value frequency analysis.""" column: str = Field(description="Name of the analyzed column") value_counts: dict[str, int | float] = Field( description="Count or proportion of each unique value", ) total_values: int = Field(description="Total number of values (including duplicates)") unique_values: int = Field(description="Number of unique/distinct values") normalize: bool = Field( default=False, description="Whether counts are normalized as proportions", )
src/databeak/servers/statistics_server.py:513-513 (registration)
Registers the get_value_counts handler function as an MCP tool named 'get_value_counts' on the FastMCP statistics_server instance.
statistics_server.tool(name="get_value_counts")(get_value_counts)

DataBeak

get_value_counts

Instructions

Input Schema

Implementation Reference

Other Tools

Latest Blog Posts

MCP directory API