Skip to main content
Glama

profile_data

Generate comprehensive data profiles with statistical insights to understand dataset characteristics, identify patterns, and support analytical workflows.

Instructions

Generate comprehensive data profile with statistical insights.

Creates a complete analytical profile of the dataset including column characteristics, data types, null patterns, and statistical summaries. Provides holistic data understanding for analytical workflows.

Returns: Comprehensive data profile with multi-dimensional analysis

Profile Components: šŸ“Š Column Profiles: Data types, null patterns, uniqueness šŸ“ˆ Statistical Summaries: Numerical column characteristics šŸ”— Correlations: Inter-variable relationships (optional) šŸŽÆ Outliers: Anomaly detection across columns (optional) šŸ’¾ Memory Usage: Resource consumption analysis

Examples: # Full data profile profile = await profile_data(ctx)

# Quick profile without expensive computations profile = await profile_data(ctx, include_correlations=False, include_outliers=False)

AI Workflow Integration: 1. Initial data exploration and understanding 2. Automated data quality reporting 3. Feature engineering guidance 4. Data preprocessing strategy development

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault

No arguments

Implementation Reference

  • The main execution logic for the profile_data tool. It retrieves the current session data, computes profiling statistics for each column (data type, nulls, uniques, most frequent), calculates memory usage, and returns a ProfileResult.
    async def profile_data( ctx: Annotated[Context, Field(description="FastMCP context for session access")], ) -> ProfileResult: """Generate comprehensive data profile with statistical insights. Creates a complete analytical profile of the dataset including column characteristics, data types, null patterns, and statistical summaries. Provides holistic data understanding for analytical workflows. Returns: Comprehensive data profile with multi-dimensional analysis Profile Components: šŸ“Š Column Profiles: Data types, null patterns, uniqueness šŸ“ˆ Statistical Summaries: Numerical column characteristics šŸ”— Correlations: Inter-variable relationships (optional) šŸŽÆ Outliers: Anomaly detection across columns (optional) šŸ’¾ Memory Usage: Resource consumption analysis Examples: # Full data profile profile = await profile_data(ctx) # Quick profile without expensive computations profile = await profile_data(ctx, include_correlations=False, include_outliers=False) AI Workflow Integration: 1. Initial data exploration and understanding 2. Automated data quality reporting 3. Feature engineering guidance 4. Data preprocessing strategy development """ # Get session_id from FastMCP context session_id = ctx.session_id _session, df = get_session_data(session_id) # Create ProfileInfo for each column (simplified to match model) profile_dict = {} for col in df.columns: col_data = df[col] # Get the most frequent value and its frequency value_counts = col_data.value_counts(dropna=False) most_frequent = None frequency = None if len(value_counts) > 0: most_frequent = value_counts.index[0] frequency = int(value_counts.iloc[0]) # Handle various data types for most_frequent if most_frequent is None or pd.isna(most_frequent): most_frequent = None elif not isinstance(most_frequent, str | int | float | bool): most_frequent = str(most_frequent) profile_info = ProfileInfo( column_name=col, data_type=str(col_data.dtype), null_count=int(col_data.isna().sum()), null_percentage=round(col_data.isna().sum() / len(df) * 100, 2), unique_count=int(col_data.nunique()), unique_percentage=round(col_data.nunique() / len(df) * 100, 2), most_frequent=most_frequent, frequency=frequency, ) profile_dict[col] = profile_info # Note: Correlation and outlier analysis have been simplified # since the ProfileResult model doesn't include them memory_usage_mb = round(df.memory_usage(deep=True).sum() / (1024 * 1024), 2) return ProfileResult( profile=profile_dict, total_rows=len(df), total_columns=len(df.columns), memory_usage_mb=memory_usage_mb, )
  • Pydantic model defining the output schema of the profile_data tool response, including per-column profiles and dataset summary metrics.
    class ProfileResult(BaseToolResponse): """Response model for comprehensive data profiling.""" profile: dict[str, ProfileInfo] total_rows: int total_columns: int memory_usage_mb: float include_correlations: bool = True include_outliers: bool = True
  • Pydantic model for individual column profiling information used within the profile_data tool's response.
    class ProfileInfo(BaseModel): """Data profiling information for a column.""" column_name: str = Field(description="Name of the profiled column") data_type: str = Field(description="Pandas data type of the column") null_count: int = Field(description="Number of null/missing values") null_percentage: float = Field(description="Percentage of null values (0-100)") unique_count: int = Field(description="Number of unique values") unique_percentage: float = Field(description="Percentage of unique values (0-100)") most_frequent: CsvCellValue = Field(None, description="Most frequently occurring value") frequency: int | None = Field(None, description="Frequency count of most common value")
  • Registers the profile_data function as an MCP tool named 'profile_data' on the discovery_server FastMCP instance.
    discovery_server.tool(name="profile_data")(profile_data)

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/jonpspri/databeak'

If you have feedback or need assistance with the MCP directory API, please join our Discord server