Skip to main content
Glama

profile_data

Generate comprehensive data profiles with statistical insights to understand dataset characteristics, identify patterns, and support analytical workflows.

Instructions

Generate comprehensive data profile with statistical insights.

Creates a complete analytical profile of the dataset including column characteristics, data types, null patterns, and statistical summaries. Provides holistic data understanding for analytical workflows.

Returns: Comprehensive data profile with multi-dimensional analysis

Profile Components: 📊 Column Profiles: Data types, null patterns, uniqueness 📈 Statistical Summaries: Numerical column characteristics 🔗 Correlations: Inter-variable relationships (optional) 🎯 Outliers: Anomaly detection across columns (optional) 💾 Memory Usage: Resource consumption analysis

Examples: # Full data profile profile = await profile_data(ctx)

# Quick profile without expensive computations
profile = await profile_data(ctx,
                           include_correlations=False,
                           include_outliers=False)

AI Workflow Integration: 1. Initial data exploration and understanding 2. Automated data quality reporting 3. Feature engineering guidance 4. Data preprocessing strategy development

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault

No arguments

Output Schema

TableJSON Schema
NameRequiredDescriptionDefault
profileYes
successNoWhether operation completed successfully
total_rowsYes
total_columnsYes
memory_usage_mbYes
include_outliersNo
include_correlationsNo

Implementation Reference

  • The main execution logic for the profile_data tool. It retrieves the current session data, computes profiling statistics for each column (data type, nulls, uniques, most frequent), calculates memory usage, and returns a ProfileResult.
    async def profile_data(
        ctx: Annotated[Context, Field(description="FastMCP context for session access")],
    ) -> ProfileResult:
        """Generate comprehensive data profile with statistical insights.
    
        Creates a complete analytical profile of the dataset including column
        characteristics, data types, null patterns, and statistical summaries.
        Provides holistic data understanding for analytical workflows.
    
        Returns:
            Comprehensive data profile with multi-dimensional analysis
    
        Profile Components:
            📊 Column Profiles: Data types, null patterns, uniqueness
            📈 Statistical Summaries: Numerical column characteristics
            🔗 Correlations: Inter-variable relationships (optional)
            🎯 Outliers: Anomaly detection across columns (optional)
            💾 Memory Usage: Resource consumption analysis
    
        Examples:
            # Full data profile
            profile = await profile_data(ctx)
    
            # Quick profile without expensive computations
            profile = await profile_data(ctx,
                                       include_correlations=False,
                                       include_outliers=False)
    
        AI Workflow Integration:
            1. Initial data exploration and understanding
            2. Automated data quality reporting
            3. Feature engineering guidance
            4. Data preprocessing strategy development
    
        """
        # Get session_id from FastMCP context
        session_id = ctx.session_id
        _session, df = get_session_data(session_id)
    
        # Create ProfileInfo for each column (simplified to match model)
        profile_dict = {}
    
        for col in df.columns:
            col_data = df[col]
    
            # Get the most frequent value and its frequency
            value_counts = col_data.value_counts(dropna=False)
            most_frequent = None
            frequency = None
            if len(value_counts) > 0:
                most_frequent = value_counts.index[0]
                frequency = int(value_counts.iloc[0])
    
                # Handle various data types for most_frequent
                if most_frequent is None or pd.isna(most_frequent):
                    most_frequent = None
                elif not isinstance(most_frequent, str | int | float | bool):
                    most_frequent = str(most_frequent)
    
            profile_info = ProfileInfo(
                column_name=col,
                data_type=str(col_data.dtype),
                null_count=int(col_data.isna().sum()),
                null_percentage=round(col_data.isna().sum() / len(df) * 100, 2),
                unique_count=int(col_data.nunique()),
                unique_percentage=round(col_data.nunique() / len(df) * 100, 2),
                most_frequent=most_frequent,
                frequency=frequency,
            )
    
            profile_dict[col] = profile_info
    
        # Note: Correlation and outlier analysis have been simplified
        # since the ProfileResult model doesn't include them
    
        memory_usage_mb = round(df.memory_usage(deep=True).sum() / (1024 * 1024), 2)
    
        return ProfileResult(
            profile=profile_dict,
            total_rows=len(df),
            total_columns=len(df.columns),
            memory_usage_mb=memory_usage_mb,
        )
  • Pydantic model defining the output schema of the profile_data tool response, including per-column profiles and dataset summary metrics.
    class ProfileResult(BaseToolResponse):
        """Response model for comprehensive data profiling."""
    
        profile: dict[str, ProfileInfo]
        total_rows: int
        total_columns: int
        memory_usage_mb: float
        include_correlations: bool = True
        include_outliers: bool = True
  • Pydantic model for individual column profiling information used within the profile_data tool's response.
    class ProfileInfo(BaseModel):
        """Data profiling information for a column."""
    
        column_name: str = Field(description="Name of the profiled column")
        data_type: str = Field(description="Pandas data type of the column")
        null_count: int = Field(description="Number of null/missing values")
        null_percentage: float = Field(description="Percentage of null values (0-100)")
        unique_count: int = Field(description="Number of unique values")
        unique_percentage: float = Field(description="Percentage of unique values (0-100)")
        most_frequent: CsvCellValue = Field(None, description="Most frequently occurring value")
        frequency: int | None = Field(None, description="Frequency count of most common value")
  • Registers the profile_data function as an MCP tool named 'profile_data' on the discovery_server FastMCP instance.
    discovery_server.tool(name="profile_data")(profile_data)

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/jonpspri/databeak'

If you have feedback or need assistance with the MCP directory API, please join our Discord server