filter_rows
Filter data rows using flexible conditions with support for null values, text matching, and logical combinations to extract specific subsets from datasets.
Instructions
Filter rows using flexible conditions: comprehensive null value and text matching support.
Provides powerful filtering capabilities optimized for AI-driven data analysis. Supports multiple operators, logical combinations, and comprehensive null value handling.
Examples: # Numeric filtering filter_rows(ctx, [{"column": "age", "operator": ">", "value": 25}])
# Text filtering with null handling
filter_rows(ctx, [
{"column": "name", "operator": "contains", "value": "Smith"},
{"column": "email", "operator": "is_not_null"}
], mode="and")
# Multiple conditions with OR logic
filter_rows(ctx, [
{"column": "status", "operator": "==", "value": "active"},
{"column": "priority", "operator": "==", "value": "high"}
], mode="or")
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| conditions | Yes | List of filter conditions with column, operator, and value | |
| mode | No | Logic for combining conditions (and/or) | and |
Implementation Reference
- The main filter_rows tool handler function that applies row filtering logic using pandas masks based on multiple conditions with AND/OR modes, supporting numeric, text, list, and null operators.def filter_rows( ctx: Annotated[Context, Field(description="FastMCP context for session access")], conditions: Annotated[ list[FilterCondition], Field(description="List of filter conditions with column, operator, and value"), ], mode: Annotated[ Literal["and", "or"], Field(description="Logic for combining conditions (and/or)"), ] = "and", ) -> FilterOperationResult: """Filter rows using flexible conditions: comprehensive null value and text matching support. Provides powerful filtering capabilities optimized for AI-driven data analysis. Supports multiple operators, logical combinations, and comprehensive null value handling. Examples: # Numeric filtering filter_rows(ctx, [{"column": "age", "operator": ">", "value": 25}]) # Text filtering with null handling filter_rows(ctx, [ {"column": "name", "operator": "contains", "value": "Smith"}, {"column": "email", "operator": "is_not_null"} ], mode="and") # Multiple conditions with OR logic filter_rows(ctx, [ {"column": "status", "operator": "==", "value": "active"}, {"column": "priority", "operator": "==", "value": "high"} ], mode="or") """ session_id = ctx.session_id session, df = get_session_data(session_id) rows_before = len(df) # Initialize mask based on mode: AND starts True, OR starts False mask = pd.Series([mode == "and"] * len(df)) # Convert dict conditions to FilterCondition objects if needed typed_conditions: list[FilterCondition] = [] for cond in conditions: if isinstance(cond, dict): # Normalize operator: convert == to = for compatibility normalized_cond = dict(cond) if "operator" in normalized_cond and normalized_cond["operator"] == "==": normalized_cond["operator"] = "=" typed_conditions.append(FilterCondition(**normalized_cond)) else: typed_conditions.append(cond) # Process conditions for condition in typed_conditions: column = condition.column operator = ( condition.operator.value if hasattr(condition.operator, "value") else condition.operator ) value = condition.value if column is None or column not in df.columns: raise ColumnNotFoundError(column, df.columns.tolist()) col_data = df[column] if operator in {"=", "=="}: condition_mask = col_data == value elif operator in {"!=", "not_equals"}: condition_mask = col_data != value elif operator == ">": condition_mask = col_data > value elif operator == "<": condition_mask = col_data < value elif operator == ">=": condition_mask = col_data >= value elif operator == "<=": condition_mask = col_data <= value elif operator == "contains": condition_mask = col_data.astype(str).str.contains(str(value), na=False) elif operator == "not_contains": condition_mask = ~col_data.astype(str).str.contains(str(value), na=False) elif operator == "starts_with": condition_mask = col_data.astype(str).str.startswith(str(value), na=False) elif operator == "ends_with": condition_mask = col_data.astype(str).str.endswith(str(value), na=False) elif operator == "in": condition_mask = col_data.isin(value if isinstance(value, list) else [value]) elif operator == "not_in": condition_mask = ~col_data.isin(value if isinstance(value, list) else [value]) elif operator == "is_null": condition_mask = col_data.isna() elif operator == "is_not_null": condition_mask = col_data.notna() else: msg = ( f"Invalid operator '{operator}'. Valid operators: " "==, !=, >, <, >=, <=, contains, not_contains, starts_with, ends_with, " "in, not_in, is_null, is_not_null" ) raise ToolError( msg, ) mask = mask & condition_mask if mode == "and" else mask | condition_mask # Apply filter session.df = df[mask].reset_index(drop=True) rows_after = len(session.df) # No longer needed - conditions are already FilterCondition objects # No longer recording operations (simplified MCP architecture) return FilterOperationResult( rows_before=rows_before, rows_after=rows_after, rows_filtered=rows_before - rows_after, conditions_applied=len(conditions), )
- src/databeak/servers/transformation_server.py:421-421 (registration)Registration of the filter_rows handler as an MCP tool on the transformation_server instance.transformation_server.tool(name="filter_rows")(filter_rows)
- Input schema: Pydantic model for FilterCondition used in filter_rows parameters, defining column, ComparisonOperator enum, and value with special validation for null/list operators.class FilterCondition(BaseModel): """A single filter condition.""" column: str = Field(..., description="Column name to filter on") operator: ComparisonOperator = Field(..., description="Comparison operator") value: FilterValue = Field(default=None, description="Value to compare against") @field_validator("value", mode="before") @classmethod def validate_value(cls, v: FilterValue, info: Any) -> FilterValue: """Validate value based on operator.""" operator = info.data.get("operator") if hasattr(info, "data") else None if operator in [ComparisonOperator.IS_NULL, ComparisonOperator.IS_NOT_NULL]: return None if operator in [ ComparisonOperator.IN, ComparisonOperator.NOT_IN, ] and not isinstance(v, list): return [v] return v
- Output schema: Pydantic model FilterOperationResult returned by filter_rows, providing statistics on rows before/after filtering and conditions applied.class FilterOperationResult(BaseToolResponse): """Response model for row filtering operations.""" rows_before: int = Field(description="Row count before filtering") rows_after: int = Field(description="Row count after filtering") rows_filtered: int = Field(description="Number of rows removed by filter") conditions_applied: int = Field(description="Number of filter conditions applied")