Skip to main content
Glama

clean_dataset

Remove duplicates, fill nulls, and rename columns to clean a dataset, saving results to a new file without altering the original.

Instructions

Apply cleaning operations to a dataset and write a new file.

NEVER modifies the original file. Always writes to output_path.

Supported operations:
  - "drop_duplicates"                    — remove exact duplicate rows
  - "drop_columns:[col1:col2:...]"       — remove specified columns
  - "fill_na:[col:value]"                — fill nulls in col with value
  - "rename_column:[old_name:new_name]"  — rename a column
  - "strip_whitespace"                   — strip leading/trailing spaces from all string columns
  - "standardize_dates:[col:format]"     — parse col as date (format: 'auto' or strftime)
  - "drop_na_rows:[col]"                 — drop rows where col is null
  - "drop_na_rows_any"                   — drop rows with ANY null value

Args:
    path:         Absolute local path to the source dataset.
    operations:   List of operation strings (see above).
    output_path:  Where to write the cleaned file. If empty, appends '_cleaned'
                  before the extension (e.g. data.csv → data_cleaned.csv).

Returns JSON with: output_path, original_shape, cleaned_shape, row_delta,
col_delta, operations_applied, operations_skipped.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
pathYes
operationsYes
output_pathNo

Output Schema

TableJSON Schema
NameRequiredDescriptionDefault
resultYes
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description carries full burden. It discloses that the tool never modifies the original file, always writes a new file, and lists all operations. It does not mention potential side effects or permissions, but for a non-destructive transformation tool, the behavioral traits are well-covered.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise and well-structured. It uses a brief initial statement, a critical note, a bulleted list of operations with syntax, and clear parameter descriptions. Every sentence provides value, and no unnecessary content is present.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the output schema exists, the description appropriately explains the return JSON fields. It covers all necessary information: purpose, behavior, operations, parameters, and return value. This makes the tool fully understandable without external documentation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 0% description coverage, so the description must explain parameters. It does so thoroughly: path (absolute local path), operations (list of operation strings with syntax), and output_path (default behavior if empty). This adds significant meaning beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Apply cleaning operations to a dataset and write a new file.' It uses a specific verb ('apply') and resource ('dataset'), and the tool is distinct from all listed siblings, which focus on other tasks like profiling or searching.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly states that the original file is never modified and output is always written to output_path. It lists supported operations. However, it does not provide explicit guidance on when not to use this tool versus alternatives (e.g., profile_dataset), but given the unique functionality, it is still clear.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/SVerITG/Metis'

If you have feedback or need assistance with the MCP directory API, please join our Discord server