merge_datasets
Combine two tabular datasets by joining them on specified columns to create unified data for analysis.
Instructions
Merge/join two datasets together - essential for combining data sources.
Args:
file_path_left: Path to left/primary dataset
file_path_right: Path to right/secondary dataset
on: Column(s) to join on (if same name in both datasets)
left_on: Column name in left dataset to join on
right_on: Column name in right dataset to join on
how: Join type - 'inner', 'left', 'right', 'outer'
preview_limit: Number of rows to return in preview
Returns:
Dictionary containing merged data preview and statistics
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| file_path_left | Yes | ||
| file_path_right | Yes | ||
| on | No | ||
| left_on | No | ||
| right_on | No | ||
| how | No | inner | |
| preview_limit | No |
Implementation Reference
- src/mcp_tabular/server.py:1117-1196 (handler)The core handler function for the 'merge_datasets' tool. Decorated with @mcp.tool() for automatic registration in the FastMCP server. Loads two datasets using _load_data helper, performs merge using pandas pd.merge with flexible join key specification, and returns merge statistics and a preview of the result.@mcp.tool() def merge_datasets( file_path_left: str, file_path_right: str, on: list[str] | None = None, left_on: str | None = None, right_on: str | None = None, how: str = "inner", preview_limit: int = 50, ) -> dict[str, Any]: """ Merge/join two datasets together - essential for combining data sources. Args: file_path_left: Path to left/primary dataset file_path_right: Path to right/secondary dataset on: Column(s) to join on (if same name in both datasets) left_on: Column name in left dataset to join on right_on: Column name in right dataset to join on how: Join type - 'inner', 'left', 'right', 'outer' preview_limit: Number of rows to return in preview Returns: Dictionary containing merged data preview and statistics """ df_left = _load_data(file_path_left) df_right = _load_data(file_path_right) valid_how = ['inner', 'left', 'right', 'outer'] if how not in valid_how: raise ValueError(f"Unknown join type: {how}. Use: {valid_how}") # Determine join keys if on: invalid_left = [c for c in on if c not in df_left.columns] invalid_right = [c for c in on if c not in df_right.columns] if invalid_left: raise ValueError(f"Columns {invalid_left} not found in left dataset") if invalid_right: raise ValueError(f"Columns {invalid_right} not found in right dataset") merged = pd.merge(df_left, df_right, on=on, how=how) elif left_on and right_on: if left_on not in df_left.columns: raise ValueError(f"Column '{left_on}' not found in left dataset") if right_on not in df_right.columns: raise ValueError(f"Column '{right_on}' not found in right dataset") merged = pd.merge(df_left, df_right, left_on=left_on, right_on=right_on, how=how) else: # Try to find common columns common_cols = list(set(df_left.columns) & set(df_right.columns)) if not common_cols: raise ValueError("No common columns found. Specify 'on', or 'left_on' and 'right_on'") on = common_cols[:1] # Use first common column merged = pd.merge(df_left, df_right, on=on, how=how) # Statistics about the merge left_rows = len(df_left) right_rows = len(df_right) merged_rows = len(merged) merge_stats = { "left_rows": left_rows, "right_rows": right_rows, "merged_rows": merged_rows, "join_type": how, "join_keys": on if on else {"left": left_on, "right": right_on}, } if how == "inner": merge_stats["left_match_pct"] = round(merged_rows / left_rows * 100, 1) if left_rows > 0 else 0 merge_stats["right_match_pct"] = round(merged_rows / right_rows * 100, 1) if right_rows > 0 else 0 return { "merge_stats": merge_stats, "merged_columns": merged.columns.tolist(), "merged_shape": {"rows": merged_rows, "columns": len(merged.columns)}, "preview": merged.head(preview_limit).to_dict(orient="records"), "has_more": merged_rows > preview_limit, }