Skip to main content
Glama

split_column

Split column values by delimiter to extract specific parts or expand into multiple columns for CSV data transformation.

Instructions

Split column values by delimiter.

Returns: ColumnOperationResult with split details

Examples: # Keep first part of split split_column(ctx, "full_name", " ", part_index=0)

# Keep last part
split_column(ctx, "email", "@", part_index=1)

# Expand into multiple columns
split_column(ctx, "address", ",", expand_to_columns=True)

# Expand with custom column names
split_column(ctx, "name", " ", expand_to_columns=True,
            new_columns=["first_name", "last_name"])

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
columnYesColumn name to split values in
delimiterNoString delimiter to split on
part_indexYesWhich part to keep (0-based index, None for first part)
expand_to_columnsYesWhether to expand splits into multiple columns
new_columnsYesNames for new columns when expanding

Implementation Reference

  • The main execution logic for the 'split_column' tool. Handles splitting column values by a delimiter, supporting part extraction or expansion to new columns. Includes input validation, pandas operations, and returns a ColumnOperationResult.
    async def split_column(
        ctx: Annotated[Context, Field(description="FastMCP context for session access")],
        column: Annotated[str, Field(description="Column name to split values in")],
        delimiter: Annotated[str, Field(description="String delimiter to split on")] = " ",
        *,
        part_index: Annotated[
            int | None,
            Field(description="Which part to keep (0-based index, None for first part)"),
        ] = None,
        expand_to_columns: Annotated[
            bool,
            Field(description="Whether to expand splits into multiple columns"),
        ] = False,
        new_columns: Annotated[
            list[str] | None,
            Field(description="Names for new columns when expanding"),
        ] = None,
    ) -> ColumnOperationResult:
        """Split column values by delimiter.
    
        Returns:
            ColumnOperationResult with split details
    
        Examples:
            # Keep first part of split
            split_column(ctx, "full_name", " ", part_index=0)
    
            # Keep last part
            split_column(ctx, "email", "@", part_index=1)
    
            # Expand into multiple columns
            split_column(ctx, "address", ",", expand_to_columns=True)
    
            # Expand with custom column names
            split_column(ctx, "name", " ", expand_to_columns=True,
                        new_columns=["first_name", "last_name"])
    
        """
        # Get session_id from FastMCP context
        session_id = ctx.session_id
        _session, df = get_session_data(session_id)
    
        _validate_column_exists(column, df)
    
        if not delimiter:
            msg = "delimiter"
            raise InvalidParameterError(msg, delimiter, "Delimiter cannot be empty")
    
        # Apply split operation
        # pandas typing limitation: str.split(expand=bool) overload not properly typed in pandas-stubs
        # See: https://github.com/pandas-dev/pandas-stubs/issues/43
        split_data = df[column].astype(str).str.split(delimiter, expand=expand_to_columns)  # type: ignore[call-overload]
    
        if expand_to_columns:
            # Expanding to multiple columns
            if isinstance(split_data, pd.DataFrame):
                num_parts = len(split_data.columns)
                columns_created = []
    
                # Use custom column names if provided
                if new_columns:
                    if len(new_columns) > num_parts:
                        # Truncate to actual number of parts
                        new_columns = new_columns[:num_parts]
                    elif len(new_columns) < num_parts:
                        # Extend with default names
                        for i in range(len(new_columns), num_parts):
                            new_columns.append(f"{column}_part_{i}")
                    column_names = new_columns
                else:
                    # Generate default column names
                    column_names = [f"{column}_part_{i}" for i in range(num_parts)]
    
                # Create new columns
                for i, col_name in enumerate(column_names):
                    if i < len(split_data.columns):
                        df[col_name] = split_data.iloc[:, i]
                        columns_created.append(col_name)
    
                affected_columns = columns_created
                operation_desc = f"split_expand_{len(columns_created)}_parts"
                rows_affected = len(df)
            else:
                # Shouldn't happen with expand=True, but handle gracefully
                msg = "expand_to_columns"
                raise InvalidParameterError(
                    msg,
                    str(expand_to_columns),
                    "Split with expand=True did not produce DataFrame",
                )
        else:
            # Not expanding - keep specific part or first part
            if part_index is None:
                part_index = 0
    
            if isinstance(split_data, pd.DataFrame):
                # This shouldn't happen with expand=False, but handle it
                if part_index < len(split_data.columns):
                    df[column] = split_data.iloc[:, part_index]
                else:
                    # Index out of range - fill with NaN
                    df[column] = pd.NA
            else:
                # Series of lists - extract specified part
                def get_part(split_list: Any) -> Any:
                    if isinstance(split_list, list) and len(split_list) > part_index:
                        return split_list[part_index]
                    return pd.NA
    
                df[column] = split_data.apply(get_part)
    
            affected_columns = [column]
            operation_desc = f"split_keep_part_{part_index}"
    
            # Count successful splits (non-null results)
            rows_affected = int(df[column].notna().sum())
    
        return ColumnOperationResult(
            operation=operation_desc,
            rows_affected=rows_affected,
            columns_affected=affected_columns,
        )
  • Registers the split_column handler function as an MCP tool with the name 'split_column' on the FastMCP server instance 'column_text_server'.
    column_text_server.tool(name="split_column")(split_column)
  • Helper function used by split_column (and other tools) to validate that the target column exists in the DataFrame.
    def _validate_column_exists(column: str, df: pd.DataFrame) -> None:
        """Validate that a column exists in the DataFrame.
    
        Args:
            column: Column name to check
            df: DataFrame to check in
    
        Raises:
            ColumnNotFoundError: If column doesn't exist
    
        """
        if column not in df.columns:
            raise ColumnNotFoundError(column, df.columns.tolist())
  • Helper function used to count the number of changes made to a column, though not directly used in split_column's return (split_column uses different counting). Note: split_column uses custom counting for rows_affected.
    def _count_column_changes(original: pd.Series, modified: pd.Series) -> int:
        """Count number of changes between original and modified column data.
    
        Args:
            original: Original column data
            modified: Modified column data
    
        Returns:
            Number of rows that changed
    
        """
        changed_mask = original.astype(str).fillna("") != modified.astype(str).fillna("")
        return int(changed_mask.sum())

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/jonpspri/databeak'

If you have feedback or need assistance with the MCP directory API, please join our Discord server