Skip to main content
Glama

get_value_counts

Analyze frequency distribution of values in a column to understand categorical data patterns, identify common values, and assess data quality with configurable counts or percentages.

Instructions

Get frequency distribution of values in a column.

Analyzes the distribution of values in a specified column, providing counts and optionally percentages for each unique value. Essential for understanding categorical data and identifying common patterns.

Returns: Frequency distribution with counts/percentages for each unique value

Analysis Features: šŸ”¢ Frequency Counts: Raw counts for each unique value šŸ“Š Percentage Mode: Normalized frequencies as percentages šŸŽÆ Top Values: Configurable limit for most frequent values šŸ“ˆ Summary Stats: Total values, unique count, distribution insights

Examples: # Basic value counts counts = await get_value_counts(ctx, "category")

# Get percentages for top 10 values counts = await get_value_counts(ctx, "status", normalize=True, top_n=10) # Sort in ascending order counts = await get_value_counts(ctx, "grade", ascending=True)

AI Workflow Integration: 1. Categorical data analysis and encoding decisions 2. Data quality assessment (identifying rare values) 3. Understanding distribution for sampling strategies 4. Feature engineering insights for categorical variables

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
columnYesName of the column to analyze value distribution
normalizeYesReturn percentages instead of raw counts
sortYesSort results by frequency
ascendingYesSort in ascending order (False = descending)
top_nYesMaximum number of values to return (None = all values)

Implementation Reference

  • The async handler function that implements the core logic for the get_value_counts tool. It retrieves the session dataframe, computes value counts using pandas.value_counts with options for normalization, sorting, and limiting results, handles various data types and NaN values, and returns a structured ValueCountsResult.
    async def get_value_counts( ctx: Annotated[Context, Field(description="FastMCP context for session access")], column: Annotated[str, Field(description="Name of the column to analyze value distribution")], *, normalize: Annotated[ bool, Field(description="Return percentages instead of raw counts"), ] = False, sort: Annotated[bool, Field(description="Sort results by frequency")] = True, ascending: Annotated[ bool, Field(description="Sort in ascending order (False = descending)"), ] = False, top_n: Annotated[ int | None, Field(description="Maximum number of values to return (None = all values)"), ] = None, ) -> ValueCountsResult: """Get frequency distribution of values in a column. Analyzes the distribution of values in a specified column, providing counts and optionally percentages for each unique value. Essential for understanding categorical data and identifying common patterns. Returns: Frequency distribution with counts/percentages for each unique value Analysis Features: šŸ”¢ Frequency Counts: Raw counts for each unique value šŸ“Š Percentage Mode: Normalized frequencies as percentages šŸŽÆ Top Values: Configurable limit for most frequent values šŸ“ˆ Summary Stats: Total values, unique count, distribution insights Examples: # Basic value counts counts = await get_value_counts(ctx, "category") # Get percentages for top 10 values counts = await get_value_counts(ctx, "status", normalize=True, top_n=10) # Sort in ascending order counts = await get_value_counts(ctx, "grade", ascending=True) AI Workflow Integration: 1. Categorical data analysis and encoding decisions 2. Data quality assessment (identifying rare values) 3. Understanding distribution for sampling strategies 4. Feature engineering insights for categorical variables """ # Get session_id from FastMCP context session_id = ctx.session_id _session, df = get_session_data(session_id) # Only need df, not session if column not in df.columns: raise ColumnNotFoundError(column, df.columns.tolist()) # Get value counts # Note: mypy has issues with value_counts overloads when normalize is a bool variable value_counts = df[column].value_counts( normalize=normalize, sort=sort, ascending=ascending, dropna=True, ) # type: ignore[call-overload] # Limit to top_n if specified if top_n is not None and top_n > 0: value_counts = value_counts.head(top_n) # Convert to dict, handling various data types counts_dict = {} for value, count in value_counts.items(): # Handle NaN and None values if pd.isna(value): key = "<null>" elif isinstance(value, str | int | float | bool): key = str(value) else: key = str(value) counts_dict[key] = float(count) if normalize else int(count) # Calculate summary statistics total_count = int(df[column].count()) # Non-null count unique_count = int(df[column].nunique()) # No longer recording operations (simplified MCP architecture) return ValueCountsResult( column=column, value_counts=counts_dict, total_values=total_count, unique_values=unique_count, normalize=normalize, )
  • Pydantic model defining the structured output schema for the get_value_counts tool response, including value counts dictionary, summary statistics, and normalization flag.
    class ValueCountsResult(BaseToolResponse): """Response model for value frequency analysis.""" column: str = Field(description="Name of the analyzed column") value_counts: dict[str, int | float] = Field( description="Count or proportion of each unique value", ) total_values: int = Field(description="Total number of values (including duplicates)") unique_values: int = Field(description="Number of unique/distinct values") normalize: bool = Field( default=False, description="Whether counts are normalized as proportions", )
  • Registers the get_value_counts handler function as an MCP tool named 'get_value_counts' on the FastMCP statistics_server instance.
    statistics_server.tool(name="get_value_counts")(get_value_counts)

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/jonpspri/databeak'

If you have feedback or need assistance with the MCP directory API, please join our Discord server