find_duplicates

Identify and handle duplicate rows in Excel/CSV files using configurable strategies including highlighting, removal, or exporting duplicates.

Instructions

Find and manage duplicate rows in Excel/CSV files with multiple strategies

Input Schema

TableJSON Schema

Name	Required	Description
`filePath`	Yes	Path to the CSV or Excel file
`columns`	No	Columns to check for duplicates (empty = all columns)
`action`	No	What to do with duplicates (default: report_only)
`keepFirst`	No	Keep first occurrence when removing (default: true)
`sheet`	No	Sheet name for Excel files (optional)

Implementation Reference

src/handlers/excel-workflow.ts:12-142 (handler)
The core handler function that implements the find_duplicates tool. Reads the file, identifies duplicate rows based on specified columns (or all), groups them, and either reports details or removes duplicates keeping first/last occurrence.
async findDuplicates(args: ToolArgs): Promise<ToolResponse> { try { const { filePath, columns = [], action = 'report_only', keepFirst = true, sheet } = args; if (!filePath) { return { content: [{ type: 'text', text: JSON.stringify({ success: false, error: 'Missing required parameter: filePath' }, null, 2) }] }; } // Read the file const data = await readFileContent(filePath, sheet); if (data.length === 0) { return { content: [{ type: 'text', text: JSON.stringify({ success: false, error: 'File is empty or could not be read' }, null, 2) }] }; } const headers = data[0]; const rows = data.slice(1); // Determine which columns to check for duplicates let checkColumns: number[] = []; if (columns.length === 0) { // Check all columns checkColumns = Array.from({length: headers.length}, (_, i) => i); } else { // Convert column names/indices to indices checkColumns = columns.map((col: any) => { if (typeof col === 'number') return col; const index = headers.indexOf(col); if (index === -1) throw new Error(`Column "${col}" not found`); return index; }); } // Find duplicates const duplicateGroups = new Map<string, number[]>(); const uniqueRows: any[][] = []; const duplicateIndices = new Set<number>(); rows.forEach((row: any[], index: number) => { const key = checkColumns.map(colIndex => String(row[colIndex] || '')).join('|||'); if (!duplicateGroups.has(key)) { duplicateGroups.set(key, []); } duplicateGroups.get(key)!.push(index); }); // Identify actual duplicates (groups with more than 1 row) const actualDuplicates = Array.from(duplicateGroups.entries()) .filter(([_, indices]) => indices.length > 1); let resultData = data; let removedCount = 0; if (action === 'remove') { // Keep headers const cleanedData = [headers]; for (const [_, indices] of duplicateGroups.entries()) { if (indices.length === 1) { // Not a duplicate, keep it cleanedData.push(rows[indices[0]]); } else { // Duplicate group - keep first or last based on keepFirst const keepIndex = keepFirst ? indices[0] : indices[indices.length - 1]; cleanedData.push(rows[keepIndex]); removedCount += indices.length - 1; } } resultData = cleanedData; // Save the cleaned file back // This would need file writing logic similar to your existing handlers } const result = { success: true, operation: 'find_duplicates', summary: { totalRows: rows.length, duplicateGroups: actualDuplicates.length, totalDuplicates: actualDuplicates.reduce((sum, [_, indices]) => sum + indices.length - 1, 0), removedRows: removedCount, resultRows: resultData.length - 1 // excluding header }, duplicates: action === 'report_only' ? actualDuplicates.map(([key, indices]) => ({ key: key.split('|||'), rowIndices: indices.map(i => i + 2), // +2 for header and 1-based indexing count: indices.length })) : undefined, action, keepFirst }; return { content: [{ type: 'text', text: JSON.stringify(result, null, 2) }] }; } catch (error) { return { content: [{ type: 'text', text: JSON.stringify({ success: false, error: error instanceof Error ? error.message : 'Unknown error', operation: 'find_duplicates' }, null, 2) }] }; } }
src/index.ts:1081-1111 (schema)
The MCP tool schema definition for 'find_duplicates', including input parameters, types, descriptions, and required fields.
name: 'find_duplicates', description: 'Find and manage duplicate rows in Excel/CSV files with multiple strategies', inputSchema: { type: 'object', properties: { filePath: { type: 'string', description: 'Path to the CSV or Excel file' }, columns: { type: 'array', items: { type: 'string' }, description: 'Columns to check for duplicates (empty = all columns)' }, action: { type: 'string', enum: ['highlight', 'remove', 'export_duplicates', 'report_only'], description: 'What to do with duplicates (default: report_only)' }, keepFirst: { type: 'boolean', description: 'Keep first occurrence when removing (default: true)' }, sheet: { type: 'string', description: 'Sheet name for Excel files (optional)' } }, required: ['filePath'] } },
src/index.ts:1269-1271 (registration)
Tool registration in the main switch dispatcher: maps tool name 'find_duplicates' to ExcelWorkflowHandler.findDuplicates method call.
case 'find_duplicates': return await this.excelWorkflowHandler.findDuplicates(toolArgs); case 'data_cleaner':
src/index.ts:58-58 (registration)
Instantiation of the ExcelWorkflowHandler class that contains the findDuplicates method.
this.excelWorkflowHandler = new ExcelWorkflowHandler();
src/ai/nlp-processor.ts:290-296 (helper)
Fallback parser in NLP processor that recognizes 'duplicate' commands and maps to action 'find_duplicates'.
} else if (lowerText.includes('duplicate')) { return { type: 'operation', action: 'find_duplicates', parameters: {}, confidence: 0.7 };

Excel MCP Server

find_duplicates

Instructions

Input Schema

Implementation Reference

Other Tools

Latest Blog Posts

MCP directory API