find_cells_with_value
Locate all cells containing a specific value in datasets for data validation, quality checking, and pattern identification. Provides coordinates and context for each match.
Instructions
Find all cells containing a specific value for data discovery.
Searches through the dataset to locate all occurrences of a specific value, providing coordinates and context. Essential for data validation, quality checking, and understanding data patterns.
Returns: Locations of all matching cells with coordinates and context
Search Features: ๐ฏ Exact Match: Precise value matching with type consideration ๐ Substring Search: Flexible text-based search for string columns ๐ Coordinates: Row and column positions for each match ๐ Summary Stats: Total matches, columns searched, search parameters
Examples: # Find all cells with value "ERROR" results = await find_cells_with_value(ctx, "ERROR")
AI Workflow Integration: 1. Data quality assessment and error detection 2. Pattern identification and data validation 3. Reference data location and verification 4. Data cleaning and preprocessing guidance
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| value | Yes | The value to search for (any data type) | |
| columns | Yes | List of columns to search (None = all columns) | |
| exact_match | Yes | True for exact match, False for substring search |
Implementation Reference
- The main handler function that implements the logic to find cells in the dataframe matching the specified value, supporting exact or substring matching in given columns, and returns CellLocation objects.async def find_cells_with_value( ctx: Annotated[Context, Field(description="FastMCP context for session access")], value: Annotated[Any, Field(description="The value to search for (any data type)")], *, columns: Annotated[ list[str] | None, Field(description="List of columns to search (None = all columns)"), ] = None, exact_match: Annotated[ bool, Field(description="True for exact match, False for substring search"), ] = True, ) -> FindCellsResult: """Find all cells containing a specific value for data discovery. Searches through the dataset to locate all occurrences of a specific value, providing coordinates and context. Essential for data validation, quality checking, and understanding data patterns. Returns: Locations of all matching cells with coordinates and context Search Features: ๐ฏ Exact Match: Precise value matching with type consideration ๐ Substring Search: Flexible text-based search for string columns ๐ Coordinates: Row and column positions for each match ๐ Summary Stats: Total matches, columns searched, search parameters Examples: # Find all cells with value "ERROR" results = await find_cells_with_value(ctx, "ERROR") # Substring search in specific columns results = await find_cells_with_value(ctx, "john", columns=["name", "email"], exact_match=False) AI Workflow Integration: 1. Data quality assessment and error detection 2. Pattern identification and data validation 3. Reference data location and verification 4. Data cleaning and preprocessing guidance """ # Get session_id from FastMCP context session_id = ctx.session_id _session, df = get_session_data(session_id) matches = [] # Determine columns to search if columns is not None: missing_cols = [col for col in columns if col not in df.columns] if missing_cols: raise ColumnNotFoundError(missing_cols[0], df.columns.tolist()) columns_to_search = columns else: columns_to_search = df.columns.tolist() # Search for matches for col in columns_to_search: if exact_match: # Exact matching if pd.isna(value): # Search for NaN values mask = df[col].isna() else: mask = df[col] == value # Substring matching (for strings) elif isinstance(value, str): mask = df[col].astype(str).str.contains(str(value), na=False, case=False) else: # For non-strings, fall back to exact match mask = df[col] == value # Get matching row indices matching_rows = df.index[mask].tolist() for row_idx in matching_rows: cell_value = df.loc[row_idx, col] # Convert to CsvCellValue compatible type processed_value: CsvCellValue if pd.isna(cell_value): processed_value = None elif hasattr(cell_value, "item"): item_value = cell_value.item() if isinstance(item_value, str | int | float | bool): processed_value = item_value else: processed_value = str(item_value) elif isinstance(cell_value, str | int | float | bool): processed_value = cell_value else: # Fallback for complex types - convert to string processed_value = str(cell_value) matches.append( CellLocation( row=int(row_idx), column=col, value=processed_value, ), ) return FindCellsResult( search_value=value, matches_found=len(matches), coordinates=matches, search_column=columns[0] if columns and len(columns) == 1 else None, exact_match=exact_match, )
- Pydantic model defining the output structure for the find_cells_with_value tool response, including search value, match count, and list of cell locations.class FindCellsResult(BaseToolResponse): """Response model for cell value search operations.""" search_value: CsvCellValue matches_found: int coordinates: list[CellLocation] search_column: str | None = None exact_match: bool
- src/databeak/servers/discovery_server.py:854-854 (registration)FastMCP tool registration for the find_cells_with_value function on the discovery_server instance.discovery_server.tool(name="find_cells_with_value")(find_cells_with_value)