Skip to main content
Glama

remove_duplicates

Remove duplicate rows from dataframes with flexible column selection and keep strategies. Choose which duplicates to retain while getting detailed deduplication statistics.

Instructions

Remove duplicate rows from the dataframe with comprehensive validation.

Provides flexible duplicate removal with options for column subset selection and different keep strategies. Handles edge cases and provides detailed statistics about the deduplication process.

Examples: # Remove exact duplicate rows remove_duplicates(ctx)

# Remove duplicates based on specific columns remove_duplicates(ctx, subset=["email", "name"]) # Keep last occurrence instead of first remove_duplicates(ctx, subset=["id"], keep="last") # Remove all duplicates (keep none) remove_duplicates(ctx, subset=["email"], keep="none")

Input Schema

NameRequiredDescriptionDefault
subsetNoColumns to consider for duplicates (None = all columns)
keepNoWhich duplicates to keep: first, last, or nonefirst

Input Schema (JSON Schema)

{ "properties": { "keep": { "default": "first", "description": "Which duplicates to keep: first, last, or none", "enum": [ "first", "last", "none" ], "type": "string" }, "subset": { "anyOf": [ { "items": { "type": "string" }, "type": "array" }, { "type": "null" } ], "default": null, "description": "Columns to consider for duplicates (None = all columns)" } }, "type": "object" }

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/jonpspri/databeak'

If you have feedback or need assistance with the MCP directory API, please join our Discord server