Skip to main content
Glama
K02D

MCP Tabular Data Analysis Server

by K02D

merge_datasets

Combine two datasets by joining them on specified columns to create a unified data source for analysis.

Instructions

Merge/join two datasets together - essential for combining data sources.

Args:
    file_path_left: Path to left/primary dataset
    file_path_right: Path to right/secondary dataset
    on: Column(s) to join on (if same name in both datasets)
    left_on: Column name in left dataset to join on
    right_on: Column name in right dataset to join on
    how: Join type - 'inner', 'left', 'right', 'outer'
    preview_limit: Number of rows to return in preview

Returns:
    Dictionary containing merged data preview and statistics

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
file_path_leftYes
file_path_rightYes
onNo
left_onNo
right_onNo
howNoinner
preview_limitNo

Implementation Reference

  • The main handler function for the 'merge_datasets' tool. It is registered via the @mcp.tool() decorator. Loads two datasets using _load_data helper, performs a pandas merge based on specified keys and join type, computes merge statistics, and returns a preview of the merged data with metadata.
    @mcp.tool()
    def merge_datasets(
        file_path_left: str,
        file_path_right: str,
        on: list[str] | None = None,
        left_on: str | None = None,
        right_on: str | None = None,
        how: str = "inner",
        preview_limit: int = 50,
    ) -> dict[str, Any]:
        """
        Merge/join two datasets together - essential for combining data sources.
        
        Args:
            file_path_left: Path to left/primary dataset
            file_path_right: Path to right/secondary dataset
            on: Column(s) to join on (if same name in both datasets)
            left_on: Column name in left dataset to join on
            right_on: Column name in right dataset to join on
            how: Join type - 'inner', 'left', 'right', 'outer'
            preview_limit: Number of rows to return in preview
        
        Returns:
            Dictionary containing merged data preview and statistics
        """
        df_left = _load_data(file_path_left)
        df_right = _load_data(file_path_right)
        
        valid_how = ['inner', 'left', 'right', 'outer']
        if how not in valid_how:
            raise ValueError(f"Unknown join type: {how}. Use: {valid_how}")
        
        # Determine join keys
        if on:
            invalid_left = [c for c in on if c not in df_left.columns]
            invalid_right = [c for c in on if c not in df_right.columns]
            if invalid_left:
                raise ValueError(f"Columns {invalid_left} not found in left dataset")
            if invalid_right:
                raise ValueError(f"Columns {invalid_right} not found in right dataset")
            merged = pd.merge(df_left, df_right, on=on, how=how)
        elif left_on and right_on:
            if left_on not in df_left.columns:
                raise ValueError(f"Column '{left_on}' not found in left dataset")
            if right_on not in df_right.columns:
                raise ValueError(f"Column '{right_on}' not found in right dataset")
            merged = pd.merge(df_left, df_right, left_on=left_on, right_on=right_on, how=how)
        else:
            # Try to find common columns
            common_cols = list(set(df_left.columns) & set(df_right.columns))
            if not common_cols:
                raise ValueError("No common columns found. Specify 'on', or 'left_on' and 'right_on'")
            on = common_cols[:1]  # Use first common column
            merged = pd.merge(df_left, df_right, on=on, how=how)
        
        # Statistics about the merge
        left_rows = len(df_left)
        right_rows = len(df_right)
        merged_rows = len(merged)
        
        merge_stats = {
            "left_rows": left_rows,
            "right_rows": right_rows,
            "merged_rows": merged_rows,
            "join_type": how,
            "join_keys": on if on else {"left": left_on, "right": right_on},
        }
        
        if how == "inner":
            merge_stats["left_match_pct"] = round(merged_rows / left_rows * 100, 1) if left_rows > 0 else 0
            merge_stats["right_match_pct"] = round(merged_rows / right_rows * 100, 1) if right_rows > 0 else 0
        
        return {
            "merge_stats": merge_stats,
            "merged_columns": merged.columns.tolist(),
            "merged_shape": {"rows": merged_rows, "columns": len(merged.columns)},
            "preview": merged.head(preview_limit).to_dict(orient="records"),
            "has_more": merged_rows > preview_limit,
        }

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/K02D/mcp-tabular'

If you have feedback or need assistance with the MCP directory API, please join our Discord server