Skip to main content
Glama
K02D

MCP Tabular Data Analysis Server

by K02D

statistical_test

Perform statistical hypothesis tests on tabular data to compare groups, analyze relationships, and determine significance using t-tests, ANOVA, chi-squared, and correlation methods.

Instructions

Perform statistical hypothesis tests on data.

Args:
    file_path: Path to CSV or SQLite file
    test_type: Type of test:
        - 'ttest_ind': Independent samples t-test (compare 2 groups)
        - 'ttest_paired': Paired samples t-test
        - 'chi_squared': Chi-squared test for categorical independence
        - 'anova': One-way ANOVA (compare 3+ groups)
        - 'mann_whitney': Non-parametric alternative to t-test
        - 'pearson': Pearson correlation test
        - 'spearman': Spearman correlation test
    column1: First column for analysis
    column2: Second column (required for correlation, optional for t-test)
    group_column: Column defining groups (for t-test, ANOVA)
    alpha: Significance level (default 0.05)

Returns:
    Dictionary containing test statistic, p-value, and interpretation

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
file_pathYes
test_typeYes
column1Yes
column2No
group_columnNo
alphaNo

Implementation Reference

  • The handler function implementing the 'statistical_test' MCP tool. It loads data from CSV/SQLite files and performs various statistical hypothesis tests including t-tests, chi-squared, ANOVA, Mann-Whitney U, and correlation tests (Pearson/Spearman). Returns test statistics, p-values, significance, and interpretations.
    @mcp.tool()
    def statistical_test(
        file_path: str,
        test_type: str,
        column1: str,
        column2: str | None = None,
        group_column: str | None = None,
        alpha: float = 0.05,
    ) -> dict[str, Any]:
        """
        Perform statistical hypothesis tests on data.
        
        Args:
            file_path: Path to CSV or SQLite file
            test_type: Type of test:
                - 'ttest_ind': Independent samples t-test (compare 2 groups)
                - 'ttest_paired': Paired samples t-test
                - 'chi_squared': Chi-squared test for categorical independence
                - 'anova': One-way ANOVA (compare 3+ groups)
                - 'mann_whitney': Non-parametric alternative to t-test
                - 'pearson': Pearson correlation test
                - 'spearman': Spearman correlation test
            column1: First column for analysis
            column2: Second column (required for correlation, optional for t-test)
            group_column: Column defining groups (for t-test, ANOVA)
            alpha: Significance level (default 0.05)
        
        Returns:
            Dictionary containing test statistic, p-value, and interpretation
        """
        df = _load_data(file_path)
        
        if column1 not in df.columns:
            raise ValueError(f"Column '{column1}' not found")
        
        result = {
            "test_type": test_type,
            "alpha": alpha,
            "columns_tested": [column1],
        }
        
        if test_type == "ttest_ind":
            # Independent samples t-test
            if group_column is None:
                raise ValueError("group_column is required for independent t-test")
            if group_column not in df.columns:
                raise ValueError(f"Group column '{group_column}' not found")
            
            groups = df[group_column].unique()
            if len(groups) != 2:
                raise ValueError(f"t-test requires exactly 2 groups, found {len(groups)}: {groups.tolist()}")
            
            group1_data = df[df[group_column] == groups[0]][column1].dropna()
            group2_data = df[df[group_column] == groups[1]][column1].dropna()
            
            t_stat, p_value = stats.ttest_ind(group1_data, group2_data)
            
            result.update({
                "groups": groups.tolist(),
                "group_means": {str(groups[0]): float(group1_data.mean()), str(groups[1]): float(group2_data.mean())},
                "group_sizes": {str(groups[0]): len(group1_data), str(groups[1]): len(group2_data)},
                "t_statistic": float(t_stat),
                "p_value": float(p_value),
                "significant": p_value < alpha,
                "interpretation": f"The difference between groups is {'statistically significant' if p_value < alpha else 'not statistically significant'} at α={alpha}",
            })
            
        elif test_type == "ttest_paired":
            if column2 is None:
                raise ValueError("column2 is required for paired t-test")
            if column2 not in df.columns:
                raise ValueError(f"Column '{column2}' not found")
            
            data1 = df[column1].dropna()
            data2 = df[column2].dropna()
            
            # Align data
            mask = df[column1].notna() & df[column2].notna()
            data1 = df.loc[mask, column1]
            data2 = df.loc[mask, column2]
            
            t_stat, p_value = stats.ttest_rel(data1, data2)
            
            result.update({
                "columns_tested": [column1, column2],
                "means": {column1: float(data1.mean()), column2: float(data2.mean())},
                "sample_size": len(data1),
                "mean_difference": float(data1.mean() - data2.mean()),
                "t_statistic": float(t_stat),
                "p_value": float(p_value),
                "significant": p_value < alpha,
                "interpretation": f"The paired difference is {'statistically significant' if p_value < alpha else 'not statistically significant'} at α={alpha}",
            })
            
        elif test_type == "chi_squared":
            if column2 is None:
                raise ValueError("column2 is required for chi-squared test")
            if column2 not in df.columns:
                raise ValueError(f"Column '{column2}' not found")
            
            contingency_table = pd.crosstab(df[column1], df[column2])
            chi2, p_value, dof, expected = stats.chi2_contingency(contingency_table)
            
            result.update({
                "columns_tested": [column1, column2],
                "chi2_statistic": float(chi2),
                "degrees_of_freedom": int(dof),
                "p_value": float(p_value),
                "significant": p_value < alpha,
                "interpretation": f"The variables are {'dependent (associated)' if p_value < alpha else 'independent'} at α={alpha}",
                "contingency_table_shape": contingency_table.shape,
            })
            
        elif test_type == "anova":
            if group_column is None:
                raise ValueError("group_column is required for ANOVA")
            if group_column not in df.columns:
                raise ValueError(f"Group column '{group_column}' not found")
            
            groups = df[group_column].unique()
            if len(groups) < 3:
                raise ValueError(f"ANOVA requires 3+ groups, found {len(groups)}. Use t-test for 2 groups.")
            
            group_data = [df[df[group_column] == g][column1].dropna() for g in groups]
            f_stat, p_value = stats.f_oneway(*group_data)
            
            group_means = {str(g): float(df[df[group_column] == g][column1].mean()) for g in groups}
            
            result.update({
                "groups": groups.tolist(),
                "group_means": group_means,
                "group_sizes": {str(g): len(df[df[group_column] == g]) for g in groups},
                "f_statistic": float(f_stat),
                "p_value": float(p_value),
                "significant": p_value < alpha,
                "interpretation": f"At least one group mean is {'significantly different' if p_value < alpha else 'not significantly different'} at α={alpha}",
            })
            
        elif test_type == "mann_whitney":
            if group_column is None:
                raise ValueError("group_column is required for Mann-Whitney test")
            
            groups = df[group_column].unique()
            if len(groups) != 2:
                raise ValueError(f"Mann-Whitney requires exactly 2 groups, found {len(groups)}")
            
            group1_data = df[df[group_column] == groups[0]][column1].dropna()
            group2_data = df[df[group_column] == groups[1]][column1].dropna()
            
            u_stat, p_value = stats.mannwhitneyu(group1_data, group2_data, alternative='two-sided')
            
            result.update({
                "groups": groups.tolist(),
                "group_medians": {str(groups[0]): float(group1_data.median()), str(groups[1]): float(group2_data.median())},
                "u_statistic": float(u_stat),
                "p_value": float(p_value),
                "significant": p_value < alpha,
                "interpretation": f"The distributions are {'significantly different' if p_value < alpha else 'not significantly different'} at α={alpha}",
            })
            
        elif test_type in ["pearson", "spearman"]:
            if column2 is None:
                raise ValueError(f"column2 is required for {test_type} correlation")
            if column2 not in df.columns:
                raise ValueError(f"Column '{column2}' not found")
            
            mask = df[column1].notna() & df[column2].notna()
            data1 = df.loc[mask, column1]
            data2 = df.loc[mask, column2]
            
            if test_type == "pearson":
                corr, p_value = stats.pearsonr(data1, data2)
            else:
                corr, p_value = stats.spearmanr(data1, data2)
            
            result.update({
                "columns_tested": [column1, column2],
                "correlation": float(corr),
                "strength": _interpret_correlation(abs(corr)),
                "direction": "positive" if corr > 0 else "negative" if corr < 0 else "none",
                "p_value": float(p_value),
                "significant": p_value < alpha,
                "sample_size": len(data1),
                "interpretation": f"There is a {_interpret_correlation(abs(corr))} {'positive' if corr > 0 else 'negative'} correlation that is {'statistically significant' if p_value < alpha else 'not statistically significant'} at α={alpha}",
            })
            
        else:
            valid_tests = ['ttest_ind', 'ttest_paired', 'chi_squared', 'anova', 'mann_whitney', 'pearson', 'spearman']
            raise ValueError(f"Unknown test_type: {test_type}. Use: {valid_tests}")
        
        return result

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/K02D/mcp-tabular'

If you have feedback or need assistance with the MCP directory API, please join our Discord server