replace_in_column
Replace text patterns in CSV columns using regex or literal strings to clean, transform, or standardize data values.
Instructions
Replace patterns in a column with replacement text.
Returns: ColumnOperationResult with replacement details
Examples: # Replace with regex replace_in_column(ctx, "name", r"Mr.", "Mister")
# Remove non-digits from phone numbers
replace_in_column(ctx, "phone", r"\D", "", regex=True)
# Simple string replacement
replace_in_column(ctx, "status", "N/A", "Unknown", regex=False)
# Replace multiple spaces with single space
replace_in_column(ctx, "description", r"\s+", " ")
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| column | Yes | Column name to apply pattern replacement in | |
| pattern | Yes | Pattern to search for (regex or literal string) | |
| replacement | Yes | Replacement text to use for matches | |
| regex | Yes | Whether to treat pattern as regex (True) or literal string (False) |
Implementation Reference
- The core handler function implementing the replace_in_column tool logic. Handles input validation, regex compilation if needed, applies string replacements using pandas, counts affected rows, and returns a ColumnOperationResult.async def replace_in_column( ctx: Annotated[Context, Field(description="FastMCP context for session access")], column: Annotated[str, Field(description="Column name to apply pattern replacement in")], pattern: Annotated[str, Field(description="Pattern to search for (regex or literal string)")], replacement: Annotated[str, Field(description="Replacement text to use for matches")], *, regex: Annotated[ bool, Field(description="Whether to treat pattern as regex (True) or literal string (False)"), ] = True, ) -> ColumnOperationResult: r"""Replace patterns in a column with replacement text. Returns: ColumnOperationResult with replacement details Examples: # Replace with regex replace_in_column(ctx, "name", r"Mr\.", "Mister") # Remove non-digits from phone numbers replace_in_column(ctx, "phone", r"\D", "", regex=True) # Simple string replacement replace_in_column(ctx, "status", "N/A", "Unknown", regex=False) # Replace multiple spaces with single space replace_in_column(ctx, "description", r"\s+", " ") """ # Get session_id from FastMCP context session_id = ctx.session_id _session, df = get_session_data(session_id) _validate_column_exists(column, df) # Validate regex pattern if using regex mode if regex: try: re.compile(pattern) except re.error as e: msg = "pattern" raise InvalidParameterError( msg, pattern, f"Invalid regex pattern: {e}", ) from e # Count replacements made original_data = df[column].copy() # Apply replacements if regex: df[column] = df[column].astype(str).str.replace(pattern, replacement, regex=True) else: df[column] = df[column].astype(str).str.replace(pattern, replacement, regex=False) # Count changes changes_made = _count_column_changes(original_data, df[column]) return ColumnOperationResult( operation="replace_pattern", rows_affected=changes_made, columns_affected=[column], )
- src/databeak/servers/column_text_server.py:531-531 (registration)Registration of the replace_in_column handler as a FastMCP tool with explicit name.column_text_server.tool(name="replace_in_column")(replace_in_column)
- Helper function to validate that the target column exists in the dataframe, used in replace_in_column.def _validate_column_exists(column: str, df: pd.DataFrame) -> None: """Validate that a column exists in the DataFrame. Args: column: Column name to check df: DataFrame to check in Raises: ColumnNotFoundError: If column doesn't exist """ if column not in df.columns: raise ColumnNotFoundError(column, df.columns.tolist())
- Helper function to count the number of rows changed after modification, used in replace_in_column.def _count_column_changes(original: pd.Series, modified: pd.Series) -> int: """Count number of changes between original and modified column data. Args: original: Original column data modified: Modified column data Returns: Number of rows that changed """ changed_mask = original.astype(str).fillna("") != modified.astype(str).fillna("") return int(changed_mask.sum())
- Input schema defined via Annotated types and Pydantic Field descriptions in the function signature, along with output type ColumnOperationResult.async def replace_in_column( ctx: Annotated[Context, Field(description="FastMCP context for session access")], column: Annotated[str, Field(description="Column name to apply pattern replacement in")], pattern: Annotated[str, Field(description="Pattern to search for (regex or literal string)")], replacement: Annotated[str, Field(description="Replacement text to use for matches")], *, regex: Annotated[ bool, Field(description="Whether to treat pattern as regex (True) or literal string (False)"), ] = True, ) -> ColumnOperationResult: