find_cells_with_value
Locates all cells containing a specific value in CSV datasets for data validation, quality checking, and pattern identification.
Instructions
Find all cells containing a specific value for data discovery.
Searches through the dataset to locate all occurrences of a specific value, providing coordinates and context. Essential for data validation, quality checking, and understanding data patterns.
Returns: Locations of all matching cells with coordinates and context
Search Features: šÆ Exact Match: Precise value matching with type consideration š Substring Search: Flexible text-based search for string columns š Coordinates: Row and column positions for each match š Summary Stats: Total matches, columns searched, search parameters
Examples: # Find all cells with value "ERROR" results = await find_cells_with_value(ctx, "ERROR")
AI Workflow Integration: 1. Data quality assessment and error detection 2. Pattern identification and data validation 3. Reference data location and verification 4. Data cleaning and preprocessing guidance
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| value | Yes | The value to search for (any data type) | |
| columns | Yes | List of columns to search (None = all columns) | |
| exact_match | Yes | True for exact match, False for substring search |
Implementation Reference
- The main handler function that implements the find_cells_with_value tool logic. It searches for the given value in specified or all columns of the dataframe, supporting exact matching or substring search, and returns cell locations.async def find_cells_with_value( ctx: Annotated[Context, Field(description="FastMCP context for session access")], value: Annotated[Any, Field(description="The value to search for (any data type)")], *, columns: Annotated[ list[str] | None, Field(description="List of columns to search (None = all columns)"), ] = None, exact_match: Annotated[ bool, Field(description="True for exact match, False for substring search"), ] = True, ) -> FindCellsResult: """Find all cells containing a specific value for data discovery. Searches through the dataset to locate all occurrences of a specific value, providing coordinates and context. Essential for data validation, quality checking, and understanding data patterns. Returns: Locations of all matching cells with coordinates and context Search Features: šÆ Exact Match: Precise value matching with type consideration š Substring Search: Flexible text-based search for string columns š Coordinates: Row and column positions for each match š Summary Stats: Total matches, columns searched, search parameters Examples: # Find all cells with value "ERROR" results = await find_cells_with_value(ctx, "ERROR") # Substring search in specific columns results = await find_cells_with_value(ctx, "john", columns=["name", "email"], exact_match=False) AI Workflow Integration: 1. Data quality assessment and error detection 2. Pattern identification and data validation 3. Reference data location and verification 4. Data cleaning and preprocessing guidance """ # Get session_id from FastMCP context session_id = ctx.session_id _session, df = get_session_data(session_id) matches = [] # Determine columns to search if columns is not None: missing_cols = [col for col in columns if col not in df.columns] if missing_cols: raise ColumnNotFoundError(missing_cols[0], df.columns.tolist()) columns_to_search = columns else: columns_to_search = df.columns.tolist() # Search for matches for col in columns_to_search: if exact_match: # Exact matching if pd.isna(value): # Search for NaN values mask = df[col].isna() else: mask = df[col] == value # Substring matching (for strings) elif isinstance(value, str): mask = df[col].astype(str).str.contains(str(value), na=False, case=False) else: # For non-strings, fall back to exact match mask = df[col] == value # Get matching row indices matching_rows = df.index[mask].tolist() for row_idx in matching_rows: cell_value = df.loc[row_idx, col] # Convert to CsvCellValue compatible type processed_value: CsvCellValue if pd.isna(cell_value): processed_value = None elif hasattr(cell_value, "item"): item_value = cell_value.item() if isinstance(item_value, str | int | float | bool): processed_value = item_value else: processed_value = str(item_value) elif isinstance(cell_value, str | int | float | bool): processed_value = cell_value else: # Fallback for complex types - convert to string processed_value = str(cell_value) matches.append( CellLocation( row=int(row_idx), column=col, value=processed_value, ), ) return FindCellsResult( search_value=value, matches_found=len(matches), coordinates=matches, search_column=columns[0] if columns and len(columns) == 1 else None, exact_match=exact_match, )
- Pydantic model defining the output schema for the find_cells_with_value tool response, including search parameters and list of matching cell locations.class FindCellsResult(BaseToolResponse): """Response model for cell value search operations.""" search_value: CsvCellValue matches_found: int coordinates: list[CellLocation] search_column: str | None = None exact_match: bool
- src/databeak/servers/discovery_server.py:854-854 (registration)FastMCP tool registration that binds the find_cells_with_value handler function to the tool name, making it available via the MCP protocol.discovery_server.tool(name="find_cells_with_value")(find_cells_with_value)