compute_correlation
Calculate correlation matrices between numeric columns in CSV or SQLite files to identify relationships and patterns in tabular data.
Instructions
Compute correlation matrix between numeric columns.
Args:
file_path: Path to CSV or SQLite file
columns: List of columns to include (default: all numeric columns)
method: Correlation method - 'pearson' (default), 'spearman', or 'kendall'
Returns:
Dictionary containing:
- method: Correlation method used
- correlation_matrix: Full correlation matrix
- top_correlations: Top 10 strongest correlations (excluding self-correlations)
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| file_path | Yes | ||
| columns | No | ||
| method | No | pearson |
Implementation Reference
- src/mcp_tabular/server.py:236-297 (handler)The primary handler function for the 'compute_correlation' tool, registered via @mcp.tool() decorator. Loads tabular data, computes Pearson/Spearman/Kendall correlation matrix on numeric columns, extracts top pairwise correlations, and returns structured results.@mcp.tool() def compute_correlation( file_path: str, columns: list[str] | None = None, method: str = "pearson", ) -> dict[str, Any]: """ Compute correlation matrix between numeric columns. Args: file_path: Path to CSV or SQLite file columns: List of columns to include (default: all numeric columns) method: Correlation method - 'pearson' (default), 'spearman', or 'kendall' Returns: Dictionary containing: - method: Correlation method used - correlation_matrix: Full correlation matrix - top_correlations: Top 10 strongest correlations (excluding self-correlations) """ df = _load_data(file_path) # Get numeric columns if columns: # Validate provided columns invalid = [c for c in columns if c not in df.columns] if invalid: raise ValueError(f"Columns not found: {invalid}") numeric_df = df[columns].select_dtypes(include=[np.number]) else: numeric_df = df.select_dtypes(include=[np.number]) if len(numeric_df.columns) < 2: raise ValueError("Need at least 2 numeric columns for correlation") # Compute correlation matrix corr_matrix = numeric_df.corr(method=method) # Find top correlations (excluding diagonal) correlations = [] for i, col1 in enumerate(corr_matrix.columns): for j, col2 in enumerate(corr_matrix.columns): if i < j: # Upper triangle only corr_value = corr_matrix.loc[col1, col2] if not np.isnan(corr_value): correlations.append({ "column1": col1, "column2": col2, "correlation": round(float(corr_value), 4), "strength": _interpret_correlation(abs(corr_value)) }) # Sort by absolute correlation correlations.sort(key=lambda x: abs(x["correlation"]), reverse=True) return { "method": method, "columns_analyzed": corr_matrix.columns.tolist(), "correlation_matrix": corr_matrix.round(4).to_dict(), "top_correlations": correlations[:10], }
- src/mcp_tabular/server.py:299-310 (helper)Supporting helper function called by compute_correlation to classify the strength of correlation coefficients into categories like 'very_strong', 'strong', etc.def _interpret_correlation(value: float) -> str: """Interpret correlation strength.""" if value >= 0.9: return "very_strong" elif value >= 0.7: return "strong" elif value >= 0.5: return "moderate" elif value >= 0.3: return "weak" else: return "negligible"
- src/mcp_tabular/server.py:242-253 (schema)The docstring within the handler defines the input schema (parameters with types and descriptions) and output schema (return dictionary structure). Serves as the tool schema for MCP.""" Compute correlation matrix between numeric columns. Args: file_path: Path to CSV or SQLite file columns: List of columns to include (default: all numeric columns) method: Correlation method - 'pearson' (default), 'spearman', or 'kendall' Returns: Dictionary containing: - method: Correlation method used - correlation_matrix: Full correlation matrix