Skip to main content
Glama

get_correlation_matrix

Calculate pairwise correlations between numerical columns to analyze variable relationships, detect multicollinearity, and support feature selection in data analysis workflows.

Instructions

Calculate correlation matrix for numerical columns.

Computes pairwise correlations between numerical columns using various correlation methods. Essential for understanding relationships between variables and feature selection in analytical workflows.

Returns: Correlation matrix with pairwise correlation coefficients

Correlation Methods: πŸ“Š Pearson: Linear relationships (default, assumes normality) πŸ“ˆ Spearman: Monotonic relationships (rank-based, non-parametric) πŸ”„ Kendall: Concordant/discordant pairs (robust, small samples)

Examples: # Basic correlation analysis corr = await get_correlation_matrix(ctx)

# Analyze specific columns with Spearman correlation
corr = await get_correlation_matrix(ctx,
                                  columns=["price", "rating", "sales"],
                                  method="spearman")

# Filter correlations above threshold
corr = await get_correlation_matrix(ctx, min_correlation=0.5)

AI Workflow Integration: 1. Feature selection and dimensionality reduction 2. Multicollinearity detection before modeling 3. Understanding variable relationships 4. Data validation and quality assessment

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
methodNoCorrelation method: pearson (linear), spearman (rank), kendall (rank)pearson
columnsNoList of columns to include (None = all numeric columns)
min_correlationNoMinimum correlation threshold to include in results

Implementation Reference

  • The main async handler function implementing get_correlation_matrix tool. Computes pairwise correlation matrix for numeric columns using pandas.corr() with support for pearson, spearman, kendall methods. Handles column selection, minimum correlation filtering, and returns structured CorrelationResult.
    async def get_correlation_matrix(
        ctx: Annotated[Context, Field(description="FastMCP context for session access")],
        method: Annotated[
            Literal["pearson", "spearman", "kendall"],
            Field(description="Correlation method: pearson (linear), spearman (rank), kendall (rank)"),
        ] = "pearson",
        columns: Annotated[
            list[str] | None,
            Field(description="List of columns to include (None = all numeric columns)"),
        ] = None,
        min_correlation: Annotated[
            float | None,
            Field(description="Minimum correlation threshold to include in results"),
        ] = None,
    ) -> CorrelationResult:
        """Calculate correlation matrix for numerical columns.
    
        Computes pairwise correlations between numerical columns using various
        correlation methods. Essential for understanding relationships between
        variables and feature selection in analytical workflows.
    
        Returns:
            Correlation matrix with pairwise correlation coefficients
    
        Correlation Methods:
            πŸ“Š Pearson: Linear relationships (default, assumes normality)
            πŸ“ˆ Spearman: Monotonic relationships (rank-based, non-parametric)
            πŸ”„ Kendall: Concordant/discordant pairs (robust, small samples)
    
        Examples:
            # Basic correlation analysis
            corr = await get_correlation_matrix(ctx)
    
            # Analyze specific columns with Spearman correlation
            corr = await get_correlation_matrix(ctx,
                                              columns=["price", "rating", "sales"],
                                              method="spearman")
    
            # Filter correlations above threshold
            corr = await get_correlation_matrix(ctx, min_correlation=0.5)
    
        AI Workflow Integration:
            1. Feature selection and dimensionality reduction
            2. Multicollinearity detection before modeling
            3. Understanding variable relationships
            4. Data validation and quality assessment
    
        """
        # Get session_id from FastMCP context
        session_id = ctx.session_id
        _session, df = get_session_data(session_id)  # Only need df, not session
    
        # Select numeric columns
        if columns:
            missing_cols = [col for col in columns if col not in df.columns]
            if missing_cols:
                raise ColumnNotFoundError(missing_cols[0], df.columns.tolist())
            numeric_df = df[columns].select_dtypes(include=[np.number])
        else:
            numeric_df = df.select_dtypes(include=[np.number])
    
        if numeric_df.empty:
            msg = "No numeric columns found for correlation analysis"
            raise ToolError(msg)
    
        settings = get_settings()
        if len(numeric_df.columns) < settings.min_statistical_sample_size:
            msg = "Correlation analysis requires at least two numeric columns"
            raise ToolError(msg)
    
        # Calculate correlation matrix
        corr_matrix = numeric_df.corr(method=method)
    
        # Convert to dict format
        correlation_dict: dict[str, dict[str, float]] = {}
        for col1 in corr_matrix.columns:
            correlation_dict[col1] = {}
            for col2 in corr_matrix.columns:
                corr_val = corr_matrix.loc[col1, col2]
                if not pd.isna(corr_val):
                    # Ensure we have a numeric value for conversion
                    correlation_dict[col1][col2] = (
                        float(corr_val) if isinstance(corr_val, int | float) else 0.0
                    )
                else:
                    correlation_dict[col1][col2] = 0.0
    
        # Filter by minimum correlation if specified
        if min_correlation is not None:
            filtered_dict = {}
            for col1, col_corrs in correlation_dict.items():
                filtered_col = {
                    col2: corr_val
                    for col2, corr_val in col_corrs.items()
                    if abs(corr_val) >= abs(min_correlation) or col1 == col2
                }
                if filtered_col:
                    filtered_dict[col1] = filtered_col
            correlation_dict = filtered_dict
    
        # No longer recording operations (simplified MCP architecture)
    
        return CorrelationResult(
            method=method,
            correlation_matrix=correlation_dict,
            columns_analyzed=list(numeric_df.columns),
        )
  • Pydantic response model defining the output schema for the get_correlation_matrix tool, including correlation_matrix, method, and columns_analyzed.
    class CorrelationResult(BaseToolResponse):
        """Response model for correlation matrix analysis."""
    
        correlation_matrix: dict[str, dict[str, float]] = Field(
            description="Correlation coefficients between columns",
        )
        method: Literal["pearson", "spearman", "kendall"] = Field(
            description="Correlation method used for analysis",
        )
        columns_analyzed: list[str] = Field(
            description="Names of columns included in correlation analysis",
        )
  • FastMCP tool registration binding the get_correlation_matrix handler function to the tool name on the statistics_server instance.
    statistics_server.tool(name="get_correlation_matrix")(get_correlation_matrix)

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/jonpspri/databeak'

If you have feedback or need assistance with the MCP directory API, please join our Discord server