remove_duplicates
Eliminate duplicate rows from CSV files. Specify columns to check and choose to keep the first or last occurrence. Simplify data cleaning and ensure accuracy.
Instructions
Remove duplicate rows.
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| keep | No | first | |
| session_id | Yes | ||
| subset | No |
Implementation Reference
- Core handler function that executes the remove_duplicates logic using pandas drop_duplicates, records the operation, and returns results.async def remove_duplicates( session_id: str, subset: Optional[List[str]] = None, keep: str = "first", ctx: Context = None ) -> Dict[str, Any]: """ Remove duplicate rows. Args: session_id: Session identifier subset: Column names to consider for duplicates (None for all) keep: Which duplicates to keep ('first', 'last', False to drop all) ctx: FastMCP context Returns: Dict with success status and duplicate info """ try: manager = get_session_manager() session = manager.get_session(session_id) if not session or session.df is None: return {"success": False, "error": "Invalid session or no data loaded"} df = session.df rows_before = len(df) if subset: missing_cols = [col for col in subset if col not in df.columns] if missing_cols: return {"success": False, "error": f"Columns not found: {missing_cols}"} # Convert keep parameter keep_param = keep if keep != "none" else False session.df = df.drop_duplicates(subset=subset, keep=keep_param).reset_index(drop=True) rows_after = len(session.df) session.record_operation(OperationType.REMOVE_DUPLICATES, { "subset": subset, "keep": keep, "rows_removed": rows_before - rows_after }) return { "success": True, "rows_before": rows_before, "rows_after": rows_after, "duplicates_removed": rows_before - rows_after, "subset": subset, "keep": keep } except Exception as e: logger.error(f"Error removing duplicates: {str(e)}") return {"success": False, "error": str(e)}
- src/csv_editor/server.py:280-289 (registration)MCP tool registration with @mcp.tool decorator. Defines the tool interface, parameters (schema), and delegates to core implementation.@mcp.tool async def remove_duplicates( session_id: str, subset: Optional[List[str]] = None, keep: str = "first", ctx: Context = None ) -> Dict[str, Any]: """Remove duplicate rows.""" return await _remove_duplicates(session_id, subset, keep, ctx)
- Enum value OperationType.REMOVE_DUPLICATES used for operation logging in the handler.REMOVE_DUPLICATES = "remove_duplicates"