get_correlation_matrix
Calculate pairwise correlations between numerical columns using Pearson, Spearman, or Kendall methods to identify variable relationships for feature selection and data analysis.
Instructions
Calculate correlation matrix for numerical columns.
Computes pairwise correlations between numerical columns using various correlation methods. Essential for understanding relationships between variables and feature selection in analytical workflows.
Returns: Correlation matrix with pairwise correlation coefficients
Correlation Methods: 📊 Pearson: Linear relationships (default, assumes normality) 📈 Spearman: Monotonic relationships (rank-based, non-parametric) 🔄 Kendall: Concordant/discordant pairs (robust, small samples)
Examples: # Basic correlation analysis corr = await get_correlation_matrix(ctx)
AI Workflow Integration: 1. Feature selection and dimensionality reduction 2. Multicollinearity detection before modeling 3. Understanding variable relationships 4. Data validation and quality assessment
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| method | No | Correlation method: pearson (linear), spearman (rank), kendall (rank) | pearson |
| columns | No | List of columns to include (None = all numeric columns) | |
| min_correlation | No | Minimum correlation threshold to include in results |
Implementation Reference
- The primary handler function implementing the core logic of the 'get_correlation_matrix' tool. It processes input parameters, loads data from the session, computes the correlation matrix using pandas, handles filtering, and returns structured results.async def get_correlation_matrix( ctx: Annotated[Context, Field(description="FastMCP context for session access")], method: Annotated[ Literal["pearson", "spearman", "kendall"], Field(description="Correlation method: pearson (linear), spearman (rank), kendall (rank)"), ] = "pearson", columns: Annotated[ list[str] | None, Field(description="List of columns to include (None = all numeric columns)"), ] = None, min_correlation: Annotated[ float | None, Field(description="Minimum correlation threshold to include in results"), ] = None, ) -> CorrelationResult: """Calculate correlation matrix for numerical columns. Computes pairwise correlations between numerical columns using various correlation methods. Essential for understanding relationships between variables and feature selection in analytical workflows. Returns: Correlation matrix with pairwise correlation coefficients Correlation Methods: 📊 Pearson: Linear relationships (default, assumes normality) 📈 Spearman: Monotonic relationships (rank-based, non-parametric) 🔄 Kendall: Concordant/discordant pairs (robust, small samples) Examples: # Basic correlation analysis corr = await get_correlation_matrix(ctx) # Analyze specific columns with Spearman correlation corr = await get_correlation_matrix(ctx, columns=["price", "rating", "sales"], method="spearman") # Filter correlations above threshold corr = await get_correlation_matrix(ctx, min_correlation=0.5) AI Workflow Integration: 1. Feature selection and dimensionality reduction 2. Multicollinearity detection before modeling 3. Understanding variable relationships 4. Data validation and quality assessment """ # Get session_id from FastMCP context session_id = ctx.session_id _session, df = get_session_data(session_id) # Only need df, not session # Select numeric columns if columns: missing_cols = [col for col in columns if col not in df.columns] if missing_cols: raise ColumnNotFoundError(missing_cols[0], df.columns.tolist()) numeric_df = df[columns].select_dtypes(include=[np.number]) else: numeric_df = df.select_dtypes(include=[np.number]) if numeric_df.empty: msg = "No numeric columns found for correlation analysis" raise ToolError(msg) settings = get_settings() if len(numeric_df.columns) < settings.min_statistical_sample_size: msg = "Correlation analysis requires at least two numeric columns" raise ToolError(msg) # Calculate correlation matrix corr_matrix = numeric_df.corr(method=method) # Convert to dict format correlation_dict: dict[str, dict[str, float]] = {} for col1 in corr_matrix.columns: correlation_dict[col1] = {} for col2 in corr_matrix.columns: corr_val = corr_matrix.loc[col1, col2] if not pd.isna(corr_val): # Ensure we have a numeric value for conversion correlation_dict[col1][col2] = ( float(corr_val) if isinstance(corr_val, int | float) else 0.0 ) else: correlation_dict[col1][col2] = 0.0 # Filter by minimum correlation if specified if min_correlation is not None: filtered_dict = {} for col1, col_corrs in correlation_dict.items(): filtered_col = { col2: corr_val for col2, corr_val in col_corrs.items() if abs(corr_val) >= abs(min_correlation) or col1 == col2 } if filtered_col: filtered_dict[col1] = filtered_col correlation_dict = filtered_dict # No longer recording operations (simplified MCP architecture) return CorrelationResult( method=method, correlation_matrix=correlation_dict, columns_analyzed=list(numeric_df.columns), )
- Pydantic model defining the output schema for the get_correlation_matrix tool response, including the correlation matrix dictionary, computation method, and analyzed columns.class CorrelationResult(BaseToolResponse): """Response model for correlation matrix analysis.""" correlation_matrix: dict[str, dict[str, float]] = Field( description="Correlation coefficients between columns", ) method: Literal["pearson", "spearman", "kendall"] = Field( description="Correlation method used for analysis", ) columns_analyzed: list[str] = Field( description="Names of columns included in correlation analysis", )
- src/databeak/servers/statistics_server.py:512-512 (registration)Registers the get_correlation_matrix handler function as an MCP tool with the name 'get_correlation_matrix' on the FastMCP statistics_server instance.statistics_server.tool(name="get_correlation_matrix")(get_correlation_matrix)