Skip to main content
Glama

remove_duplicates

Eliminate duplicate rows from CSV files to maintain data integrity and accuracy in datasets.

Instructions

Remove duplicate rows.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
session_idYes
subsetNo
keepNofirst

Implementation Reference

  • Core handler function that executes the duplicate removal using pandas.DataFrame.drop_duplicates, updates the session dataframe, records the operation, and returns statistics on rows removed.
    async def remove_duplicates( session_id: str, subset: Optional[List[str]] = None, keep: str = "first", ctx: Context = None ) -> Dict[str, Any]: """ Remove duplicate rows. Args: session_id: Session identifier subset: Column names to consider for duplicates (None for all) keep: Which duplicates to keep ('first', 'last', False to drop all) ctx: FastMCP context Returns: Dict with success status and duplicate info """ try: manager = get_session_manager() session = manager.get_session(session_id) if not session or session.df is None: return {"success": False, "error": "Invalid session or no data loaded"} df = session.df rows_before = len(df) if subset: missing_cols = [col for col in subset if col not in df.columns] if missing_cols: return {"success": False, "error": f"Columns not found: {missing_cols}"} # Convert keep parameter keep_param = keep if keep != "none" else False session.df = df.drop_duplicates(subset=subset, keep=keep_param).reset_index(drop=True) rows_after = len(session.df) session.record_operation(OperationType.REMOVE_DUPLICATES, { "subset": subset, "keep": keep, "rows_removed": rows_before - rows_after }) return { "success": True, "rows_before": rows_before, "rows_after": rows_after, "duplicates_removed": rows_before - rows_after, "subset": subset, "keep": keep } except Exception as e: logger.error(f"Error removing duplicates: {str(e)}") return {"success": False, "error": str(e)}
  • MCP tool registration decorator and wrapper function that delegates to the core implementation in transformations.py, defining the tool schema via type annotations.
    @mcp.tool async def remove_duplicates( session_id: str, subset: Optional[List[str]] = None, keep: str = "first", ctx: Context = None ) -> Dict[str, Any]: """Remove duplicate rows.""" return await _remove_duplicates(session_id, subset, keep, ctx)
  • OperationType enum value used to record the remove_duplicates operation in session history.
    REMOVE_DUPLICATES = "remove_duplicates"

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/santoshray02/csv-editor'

If you have feedback or need assistance with the MCP directory API, please join our Discord server