Skip to main content
Glama
K02D

MCP Tabular Data Analysis Server

by K02D

statistical_test

Perform statistical hypothesis tests on tabular data to compare groups, analyze relationships, and determine significance using t-tests, ANOVA, chi-squared, and correlation methods.

Instructions

Perform statistical hypothesis tests on data.

Args:
    file_path: Path to CSV or SQLite file
    test_type: Type of test:
        - 'ttest_ind': Independent samples t-test (compare 2 groups)
        - 'ttest_paired': Paired samples t-test
        - 'chi_squared': Chi-squared test for categorical independence
        - 'anova': One-way ANOVA (compare 3+ groups)
        - 'mann_whitney': Non-parametric alternative to t-test
        - 'pearson': Pearson correlation test
        - 'spearman': Spearman correlation test
    column1: First column for analysis
    column2: Second column (required for correlation, optional for t-test)
    group_column: Column defining groups (for t-test, ANOVA)
    alpha: Significance level (default 0.05)

Returns:
    Dictionary containing test statistic, p-value, and interpretation

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
file_pathYes
test_typeYes
column1Yes
column2No
group_columnNo
alphaNo

Output Schema

TableJSON Schema
NameRequiredDescriptionDefault

No arguments

Implementation Reference

  • The handler function implementing the 'statistical_test' MCP tool. It loads data from CSV/SQLite files and performs various statistical hypothesis tests including t-tests, chi-squared, ANOVA, Mann-Whitney U, and correlation tests (Pearson/Spearman). Returns test statistics, p-values, significance, and interpretations.
    @mcp.tool()
    def statistical_test(
        file_path: str,
        test_type: str,
        column1: str,
        column2: str | None = None,
        group_column: str | None = None,
        alpha: float = 0.05,
    ) -> dict[str, Any]:
        """
        Perform statistical hypothesis tests on data.
        
        Args:
            file_path: Path to CSV or SQLite file
            test_type: Type of test:
                - 'ttest_ind': Independent samples t-test (compare 2 groups)
                - 'ttest_paired': Paired samples t-test
                - 'chi_squared': Chi-squared test for categorical independence
                - 'anova': One-way ANOVA (compare 3+ groups)
                - 'mann_whitney': Non-parametric alternative to t-test
                - 'pearson': Pearson correlation test
                - 'spearman': Spearman correlation test
            column1: First column for analysis
            column2: Second column (required for correlation, optional for t-test)
            group_column: Column defining groups (for t-test, ANOVA)
            alpha: Significance level (default 0.05)
        
        Returns:
            Dictionary containing test statistic, p-value, and interpretation
        """
        df = _load_data(file_path)
        
        if column1 not in df.columns:
            raise ValueError(f"Column '{column1}' not found")
        
        result = {
            "test_type": test_type,
            "alpha": alpha,
            "columns_tested": [column1],
        }
        
        if test_type == "ttest_ind":
            # Independent samples t-test
            if group_column is None:
                raise ValueError("group_column is required for independent t-test")
            if group_column not in df.columns:
                raise ValueError(f"Group column '{group_column}' not found")
            
            groups = df[group_column].unique()
            if len(groups) != 2:
                raise ValueError(f"t-test requires exactly 2 groups, found {len(groups)}: {groups.tolist()}")
            
            group1_data = df[df[group_column] == groups[0]][column1].dropna()
            group2_data = df[df[group_column] == groups[1]][column1].dropna()
            
            t_stat, p_value = stats.ttest_ind(group1_data, group2_data)
            
            result.update({
                "groups": groups.tolist(),
                "group_means": {str(groups[0]): float(group1_data.mean()), str(groups[1]): float(group2_data.mean())},
                "group_sizes": {str(groups[0]): len(group1_data), str(groups[1]): len(group2_data)},
                "t_statistic": float(t_stat),
                "p_value": float(p_value),
                "significant": p_value < alpha,
                "interpretation": f"The difference between groups is {'statistically significant' if p_value < alpha else 'not statistically significant'} at α={alpha}",
            })
            
        elif test_type == "ttest_paired":
            if column2 is None:
                raise ValueError("column2 is required for paired t-test")
            if column2 not in df.columns:
                raise ValueError(f"Column '{column2}' not found")
            
            data1 = df[column1].dropna()
            data2 = df[column2].dropna()
            
            # Align data
            mask = df[column1].notna() & df[column2].notna()
            data1 = df.loc[mask, column1]
            data2 = df.loc[mask, column2]
            
            t_stat, p_value = stats.ttest_rel(data1, data2)
            
            result.update({
                "columns_tested": [column1, column2],
                "means": {column1: float(data1.mean()), column2: float(data2.mean())},
                "sample_size": len(data1),
                "mean_difference": float(data1.mean() - data2.mean()),
                "t_statistic": float(t_stat),
                "p_value": float(p_value),
                "significant": p_value < alpha,
                "interpretation": f"The paired difference is {'statistically significant' if p_value < alpha else 'not statistically significant'} at α={alpha}",
            })
            
        elif test_type == "chi_squared":
            if column2 is None:
                raise ValueError("column2 is required for chi-squared test")
            if column2 not in df.columns:
                raise ValueError(f"Column '{column2}' not found")
            
            contingency_table = pd.crosstab(df[column1], df[column2])
            chi2, p_value, dof, expected = stats.chi2_contingency(contingency_table)
            
            result.update({
                "columns_tested": [column1, column2],
                "chi2_statistic": float(chi2),
                "degrees_of_freedom": int(dof),
                "p_value": float(p_value),
                "significant": p_value < alpha,
                "interpretation": f"The variables are {'dependent (associated)' if p_value < alpha else 'independent'} at α={alpha}",
                "contingency_table_shape": contingency_table.shape,
            })
            
        elif test_type == "anova":
            if group_column is None:
                raise ValueError("group_column is required for ANOVA")
            if group_column not in df.columns:
                raise ValueError(f"Group column '{group_column}' not found")
            
            groups = df[group_column].unique()
            if len(groups) < 3:
                raise ValueError(f"ANOVA requires 3+ groups, found {len(groups)}. Use t-test for 2 groups.")
            
            group_data = [df[df[group_column] == g][column1].dropna() for g in groups]
            f_stat, p_value = stats.f_oneway(*group_data)
            
            group_means = {str(g): float(df[df[group_column] == g][column1].mean()) for g in groups}
            
            result.update({
                "groups": groups.tolist(),
                "group_means": group_means,
                "group_sizes": {str(g): len(df[df[group_column] == g]) for g in groups},
                "f_statistic": float(f_stat),
                "p_value": float(p_value),
                "significant": p_value < alpha,
                "interpretation": f"At least one group mean is {'significantly different' if p_value < alpha else 'not significantly different'} at α={alpha}",
            })
            
        elif test_type == "mann_whitney":
            if group_column is None:
                raise ValueError("group_column is required for Mann-Whitney test")
            
            groups = df[group_column].unique()
            if len(groups) != 2:
                raise ValueError(f"Mann-Whitney requires exactly 2 groups, found {len(groups)}")
            
            group1_data = df[df[group_column] == groups[0]][column1].dropna()
            group2_data = df[df[group_column] == groups[1]][column1].dropna()
            
            u_stat, p_value = stats.mannwhitneyu(group1_data, group2_data, alternative='two-sided')
            
            result.update({
                "groups": groups.tolist(),
                "group_medians": {str(groups[0]): float(group1_data.median()), str(groups[1]): float(group2_data.median())},
                "u_statistic": float(u_stat),
                "p_value": float(p_value),
                "significant": p_value < alpha,
                "interpretation": f"The distributions are {'significantly different' if p_value < alpha else 'not significantly different'} at α={alpha}",
            })
            
        elif test_type in ["pearson", "spearman"]:
            if column2 is None:
                raise ValueError(f"column2 is required for {test_type} correlation")
            if column2 not in df.columns:
                raise ValueError(f"Column '{column2}' not found")
            
            mask = df[column1].notna() & df[column2].notna()
            data1 = df.loc[mask, column1]
            data2 = df.loc[mask, column2]
            
            if test_type == "pearson":
                corr, p_value = stats.pearsonr(data1, data2)
            else:
                corr, p_value = stats.spearmanr(data1, data2)
            
            result.update({
                "columns_tested": [column1, column2],
                "correlation": float(corr),
                "strength": _interpret_correlation(abs(corr)),
                "direction": "positive" if corr > 0 else "negative" if corr < 0 else "none",
                "p_value": float(p_value),
                "significant": p_value < alpha,
                "sample_size": len(data1),
                "interpretation": f"There is a {_interpret_correlation(abs(corr))} {'positive' if corr > 0 else 'negative'} correlation that is {'statistically significant' if p_value < alpha else 'not statistically significant'} at α={alpha}",
            })
            
        else:
            valid_tests = ['ttest_ind', 'ttest_paired', 'chi_squared', 'anova', 'mann_whitney', 'pearson', 'spearman']
            raise ValueError(f"Unknown test_type: {test_type}. Use: {valid_tests}")
        
        return result
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries full burden but only partially discloses behavioral traits. It mentions the return format ('Dictionary containing test statistic, p-value, and interpretation') but doesn't cover important aspects like error handling, performance characteristics, data format requirements beyond file types, or whether the operation modifies data. The description doesn't contradict annotations since none exist.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured with clear sections (Args, Returns) and uses bullet points for test types. It's appropriately sized for a complex statistical tool with 6 parameters. Some minor redundancy exists (e.g., 'column2' explanation could be more concise), but overall it's efficient and front-loaded with the core purpose.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (6 parameters, statistical operations) and the presence of an output schema (implied by 'Returns' section), the description provides good contextual coverage. It explains parameters thoroughly and mentions the return format. However, without annotations and given the statistical complexity, it could benefit from more behavioral context about assumptions, limitations, or data requirements.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 0% schema description coverage, the description fully compensates by providing comprehensive parameter semantics. Each parameter is explained with clear meanings, test type enumerations with descriptions, default values, and conditional requirements (e.g., 'required for correlation, optional for t-test'). This adds substantial value beyond the bare schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose as 'Perform statistical hypothesis tests on data' with specific test types listed. It distinguishes from siblings like 'compute_correlation' by covering broader statistical testing beyond just correlation, but doesn't explicitly differentiate from all siblings like 'analyze_time_series' or 'detect_anomalies' which might also involve statistical methods.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage through parameter explanations (e.g., 'required for correlation, optional for t-test'), but doesn't provide explicit guidance on when to choose this tool over alternatives like 'compute_correlation' or 'analyze_time_series'. The test type explanations help understand appropriate contexts, but no explicit when/when-not statements are provided.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/K02D/mcp-tabular'

If you have feedback or need assistance with the MCP directory API, please join our Discord server