fill_missing_values
Handle missing data in CSV files using strategies like imputation, forward/backward fill, or row removal to prepare datasets for analysis.
Instructions
Fill or remove missing values with comprehensive strategy support.
Provides multiple strategies for handling missing data, including statistical imputation methods. Handles different data types appropriately and validates strategy compatibility with column types.
Examples: # Drop rows with any missing values fill_missing_values(ctx, strategy="drop")
# Fill missing values with 0
fill_missing_values(ctx, strategy="fill", value=0)
# Forward fill specific columns
fill_missing_values(ctx, strategy="forward", columns=["price", "quantity"])
# Fill with column mean for numeric columns
fill_missing_values(ctx, strategy="mean", columns=["age", "salary"])
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| strategy | No | Strategy for handling missing values (drop, fill, forward, backward, mean, median, mode) | drop |
| value | No | Value to use when strategy is 'fill' | |
| columns | No | Columns to process (None = all columns) |
Implementation Reference
- The core handler function that executes the fill_missing_values tool logic. It supports multiple strategies (drop, fill, forward, backward, mean, median, mode) for handling missing values in specified or all columns of the dataframe, with appropriate validation and error handling.def fill_missing_values( ctx: Annotated[Context, Field(description="FastMCP context for session access")], strategy: Annotated[ Literal["drop", "fill", "forward", "backward", "mean", "median", "mode"], Field( description="Strategy for handling missing values (drop, fill, forward, backward, mean, median, mode)", ), ] = "drop", value: Annotated[CellValue, Field(description="Value to use when strategy is 'fill'")] = None, columns: Annotated[ list[str] | None, Field(description="Columns to process (None = all columns)"), ] = None, ) -> ColumnOperationResult: """Fill or remove missing values with comprehensive strategy support. Provides multiple strategies for handling missing data, including statistical imputation methods. Handles different data types appropriately and validates strategy compatibility with column types. Examples: # Drop rows with any missing values fill_missing_values(ctx, strategy="drop") # Fill missing values with 0 fill_missing_values(ctx, strategy="fill", value=0) # Forward fill specific columns fill_missing_values(ctx, strategy="forward", columns=["price", "quantity"]) # Fill with column mean for numeric columns fill_missing_values(ctx, strategy="mean", columns=["age", "salary"]) """ session_id = ctx.session_id session, df = get_session_data(session_id) # Validate and set target columns if columns: missing_cols = [col for col in columns if col not in df.columns] if missing_cols: msg = f"Columns not found: {missing_cols}" raise ToolError(msg) target_cols = columns else: target_cols = df.columns.tolist() # Count missing values before processing missing_before = df[target_cols].isna().sum().sum() # Apply strategy if strategy == "drop": session.df = df.dropna(subset=target_cols) elif strategy == "fill": if value is None: msg = "Value required for 'fill' strategy" raise ToolError(msg) session.df = df.copy() session.df[target_cols] = df[target_cols].fillna(value) elif strategy == "forward": session.df = df.copy() session.df[target_cols] = df[target_cols].ffill() elif strategy == "backward": session.df = df.copy() session.df[target_cols] = df[target_cols].bfill() elif strategy == "mean": session.df = df.copy() for col in target_cols: if pd.api.types.is_numeric_dtype(df[col]): mean_val = df[col].mean() if not pd.isna(mean_val): session.df[col] = df[col].fillna(mean_val) else: logger.warning("Column '%s' is not numeric, skipping mean fill", col) elif strategy == "median": session.df = df.copy() for col in target_cols: if pd.api.types.is_numeric_dtype(df[col]): median_val = df[col].median() if not pd.isna(median_val): session.df[col] = df[col].fillna(median_val) else: logger.warning("Column '%s' is not numeric, skipping median fill", col) elif strategy == "mode": session.df = df.copy() for col in target_cols: mode_val = df[col].mode() if len(mode_val) > 0: session.df[col] = df[col].fillna(mode_val[0]) else: msg = ( f"Invalid strategy '{strategy}'. Valid strategies: " "drop, fill, forward, backward, mean, median, mode" ) raise ToolError( msg, ) rows_after = len(session.df) missing_after = session.df[target_cols].isna().sum().sum() values_filled = missing_before - missing_after # No longer recording operations (simplified MCP architecture) return ColumnOperationResult( operation="fill_missing_values", rows_affected=rows_after, columns_affected=target_cols, values_filled=int(values_filled), )
- src/databeak/servers/transformation_server.py:424-424 (registration)The registration of the fill_missing_values handler function as an MCP tool on the transformation_server instance.transformation_server.tool(name="fill_missing_values")(fill_missing_values)
- Pydantic response model ColumnOperationResult used by fill_missing_values tool, including specific 'values_filled' field for reporting the number of missing values processed.class ColumnOperationResult(BaseToolResponse): """Response model for column operations (add, remove, rename, etc.).""" operation: str = Field(description="Type of operation performed") rows_affected: int = Field(description="Number of rows affected by operation") columns_affected: list[str] = Field(description="Names of columns affected") original_sample: list[CsvCellValue] | None = Field( default=None, description="Sample values before operation", ) updated_sample: list[CsvCellValue] | None = Field( default=None, description="Sample values after operation", ) # Additional fields for specific operations part_index: int | None = Field(default=None, description="Part index for split operations") transform: str | None = Field(default=None, description="Transform description") nulls_filled: int | None = Field(default=None, description="Number of null values filled") rows_removed: int | None = Field( default=None, description="Number of rows removed (for remove_duplicates)", ) values_filled: int | None = Field( default=None, description="Number of values filled (for fill_missing_values)", )