Skip to main content
Glama
santoshray02

CSV Editor

by santoshray02

remove_duplicates

Eliminate duplicate rows from CSV files to maintain data integrity and accuracy in datasets.

Instructions

Remove duplicate rows.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
session_idYes
subsetNo
keepNofirst

Implementation Reference

  • Core handler function that executes the duplicate removal using pandas.DataFrame.drop_duplicates, updates the session dataframe, records the operation, and returns statistics on rows removed.
    async def remove_duplicates(
        session_id: str,
        subset: Optional[List[str]] = None,
        keep: str = "first",
        ctx: Context = None
    ) -> Dict[str, Any]:
        """
        Remove duplicate rows.
        
        Args:
            session_id: Session identifier
            subset: Column names to consider for duplicates (None for all)
            keep: Which duplicates to keep ('first', 'last', False to drop all)
            ctx: FastMCP context
            
        Returns:
            Dict with success status and duplicate info
        """
        try:
            manager = get_session_manager()
            session = manager.get_session(session_id)
            
            if not session or session.df is None:
                return {"success": False, "error": "Invalid session or no data loaded"}
            
            df = session.df
            rows_before = len(df)
            
            if subset:
                missing_cols = [col for col in subset if col not in df.columns]
                if missing_cols:
                    return {"success": False, "error": f"Columns not found: {missing_cols}"}
            
            # Convert keep parameter
            keep_param = keep if keep != "none" else False
            
            session.df = df.drop_duplicates(subset=subset, keep=keep_param).reset_index(drop=True)
            rows_after = len(session.df)
            
            session.record_operation(OperationType.REMOVE_DUPLICATES, {
                "subset": subset,
                "keep": keep,
                "rows_removed": rows_before - rows_after
            })
            
            return {
                "success": True,
                "rows_before": rows_before,
                "rows_after": rows_after,
                "duplicates_removed": rows_before - rows_after,
                "subset": subset,
                "keep": keep
            }
            
        except Exception as e:
            logger.error(f"Error removing duplicates: {str(e)}")
            return {"success": False, "error": str(e)}
  • MCP tool registration decorator and wrapper function that delegates to the core implementation in transformations.py, defining the tool schema via type annotations.
    @mcp.tool
    async def remove_duplicates(
        session_id: str,
        subset: Optional[List[str]] = None,
        keep: str = "first",
        ctx: Context = None
    ) -> Dict[str, Any]:
        """Remove duplicate rows."""
        return await _remove_duplicates(session_id, subset, keep, ctx)
  • OperationType enum value used to record the remove_duplicates operation in session history.
    REMOVE_DUPLICATES = "remove_duplicates"

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/santoshray02/csv-editor'

If you have feedback or need assistance with the MCP directory API, please join our Discord server