clean_dataset
Apply cleaning operations such as removing duplicates, filling nulls, or renaming columns to a dataset and write the cleaned data to a new file.
Instructions
Apply cleaning operations to a dataset and write a new file.
NEVER modifies the original file. Always writes to output_path.
⚠️ Writing a dataset requires authorization. This tool refuses to write
unless authorized=True (confirm with the user first) or the env var
METIS_ALLOW_DATA_WRITE=1 is set. This mirrors the Claude Code write-gate so
a rebuild can't bypass it via MCP.
Supported operations:
- "drop_duplicates" — remove exact duplicate rows
- "drop_columns:[col1:col2:...]" — remove specified columns
- "fill_na:[col:value]" — fill nulls in col with value
- "rename_column:[old_name:new_name]" — rename a column
- "strip_whitespace" — strip leading/trailing spaces from all string columns
- "standardize_dates:[col:format]" — parse col as date (format: 'auto' or strftime)
- "drop_na_rows:[col]" — drop rows where col is null
- "drop_na_rows_any" — drop rows with ANY null value
Args:
path: Absolute local path to the source dataset.
operations: List of operation strings (see above).
output_path: Where to write the cleaned file. If empty, appends '_cleaned'
before the extension (e.g. data.csv → data_cleaned.csv).
Returns JSON with: output_path, original_shape, cleaned_shape, row_delta,
col_delta, operations_applied, operations_skipped.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| path | Yes | ||
| operations | Yes | ||
| output_path | No | ||
| authorized | No |
Output Schema
| Name | Required | Description | Default |
|---|---|---|---|
| result | Yes |