remove_duplicates

Instructions

Remove duplicate rows.

Input Schema

TableJSON Schema

Name	Required	Default
`session_id`	Yes
`subset`	No
`keep`	No	first

Implementation Reference

src/csv_editor/tools/transformations.py:634-690 (handler)
Core handler function that executes the duplicate removal using pandas.DataFrame.drop_duplicates, updates the session dataframe, records the operation, and returns statistics on rows removed.
async def remove_duplicates( session_id: str, subset: Optional[List[str]] = None, keep: str = "first", ctx: Context = None ) -> Dict[str, Any]: """ Remove duplicate rows. Args: session_id: Session identifier subset: Column names to consider for duplicates (None for all) keep: Which duplicates to keep ('first', 'last', False to drop all) ctx: FastMCP context Returns: Dict with success status and duplicate info """ try: manager = get_session_manager() session = manager.get_session(session_id) if not session or session.df is None: return {"success": False, "error": "Invalid session or no data loaded"} df = session.df rows_before = len(df) if subset: missing_cols = [col for col in subset if col not in df.columns] if missing_cols: return {"success": False, "error": f"Columns not found: {missing_cols}"} # Convert keep parameter keep_param = keep if keep != "none" else False session.df = df.drop_duplicates(subset=subset, keep=keep_param).reset_index(drop=True) rows_after = len(session.df) session.record_operation(OperationType.REMOVE_DUPLICATES, { "subset": subset, "keep": keep, "rows_removed": rows_before - rows_after }) return { "success": True, "rows_before": rows_before, "rows_after": rows_after, "duplicates_removed": rows_before - rows_after, "subset": subset, "keep": keep } except Exception as e: logger.error(f"Error removing duplicates: {str(e)}") return {"success": False, "error": str(e)}
src/csv_editor/server.py:280-288 (registration)
MCP tool registration decorator and wrapper function that delegates to the core implementation in transformations.py, defining the tool schema via type annotations.
@mcp.tool async def remove_duplicates( session_id: str, subset: Optional[List[str]] = None, keep: str = "first", ctx: Context = None ) -> Dict[str, Any]: """Remove duplicate rows.""" return await _remove_duplicates(session_id, subset, keep, ctx)
src/csv_editor/models/data_models.py:38-38 (helper)
OperationType enum value used to record the remove_duplicates operation in session history.
REMOVE_DUPLICATES = "remove_duplicates"

CSV Editor

Instructions

Input Schema

Implementation Reference

Other Tools

Latest Blog Posts

MCP directory API