Skip to main content
Glama

find_duplicates

Identify and handle duplicate rows in Excel/CSV files using configurable strategies including highlighting, removal, or exporting duplicates.

Instructions

Find and manage duplicate rows in Excel/CSV files with multiple strategies

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
filePathYesPath to the CSV or Excel file
columnsNoColumns to check for duplicates (empty = all columns)
actionNoWhat to do with duplicates (default: report_only)
keepFirstNoKeep first occurrence when removing (default: true)
sheetNoSheet name for Excel files (optional)

Implementation Reference

  • The core handler function that implements the find_duplicates tool. Reads the file, identifies duplicate rows based on specified columns (or all), groups them, and either reports details or removes duplicates keeping first/last occurrence.
    async findDuplicates(args: ToolArgs): Promise<ToolResponse> { try { const { filePath, columns = [], action = 'report_only', keepFirst = true, sheet } = args; if (!filePath) { return { content: [{ type: 'text', text: JSON.stringify({ success: false, error: 'Missing required parameter: filePath' }, null, 2) }] }; } // Read the file const data = await readFileContent(filePath, sheet); if (data.length === 0) { return { content: [{ type: 'text', text: JSON.stringify({ success: false, error: 'File is empty or could not be read' }, null, 2) }] }; } const headers = data[0]; const rows = data.slice(1); // Determine which columns to check for duplicates let checkColumns: number[] = []; if (columns.length === 0) { // Check all columns checkColumns = Array.from({length: headers.length}, (_, i) => i); } else { // Convert column names/indices to indices checkColumns = columns.map((col: any) => { if (typeof col === 'number') return col; const index = headers.indexOf(col); if (index === -1) throw new Error(`Column "${col}" not found`); return index; }); } // Find duplicates const duplicateGroups = new Map<string, number[]>(); const uniqueRows: any[][] = []; const duplicateIndices = new Set<number>(); rows.forEach((row: any[], index: number) => { const key = checkColumns.map(colIndex => String(row[colIndex] || '')).join('|||'); if (!duplicateGroups.has(key)) { duplicateGroups.set(key, []); } duplicateGroups.get(key)!.push(index); }); // Identify actual duplicates (groups with more than 1 row) const actualDuplicates = Array.from(duplicateGroups.entries()) .filter(([_, indices]) => indices.length > 1); let resultData = data; let removedCount = 0; if (action === 'remove') { // Keep headers const cleanedData = [headers]; for (const [_, indices] of duplicateGroups.entries()) { if (indices.length === 1) { // Not a duplicate, keep it cleanedData.push(rows[indices[0]]); } else { // Duplicate group - keep first or last based on keepFirst const keepIndex = keepFirst ? indices[0] : indices[indices.length - 1]; cleanedData.push(rows[keepIndex]); removedCount += indices.length - 1; } } resultData = cleanedData; // Save the cleaned file back // This would need file writing logic similar to your existing handlers } const result = { success: true, operation: 'find_duplicates', summary: { totalRows: rows.length, duplicateGroups: actualDuplicates.length, totalDuplicates: actualDuplicates.reduce((sum, [_, indices]) => sum + indices.length - 1, 0), removedRows: removedCount, resultRows: resultData.length - 1 // excluding header }, duplicates: action === 'report_only' ? actualDuplicates.map(([key, indices]) => ({ key: key.split('|||'), rowIndices: indices.map(i => i + 2), // +2 for header and 1-based indexing count: indices.length })) : undefined, action, keepFirst }; return { content: [{ type: 'text', text: JSON.stringify(result, null, 2) }] }; } catch (error) { return { content: [{ type: 'text', text: JSON.stringify({ success: false, error: error instanceof Error ? error.message : 'Unknown error', operation: 'find_duplicates' }, null, 2) }] }; } }
  • The MCP tool schema definition for 'find_duplicates', including input parameters, types, descriptions, and required fields.
    name: 'find_duplicates', description: 'Find and manage duplicate rows in Excel/CSV files with multiple strategies', inputSchema: { type: 'object', properties: { filePath: { type: 'string', description: 'Path to the CSV or Excel file' }, columns: { type: 'array', items: { type: 'string' }, description: 'Columns to check for duplicates (empty = all columns)' }, action: { type: 'string', enum: ['highlight', 'remove', 'export_duplicates', 'report_only'], description: 'What to do with duplicates (default: report_only)' }, keepFirst: { type: 'boolean', description: 'Keep first occurrence when removing (default: true)' }, sheet: { type: 'string', description: 'Sheet name for Excel files (optional)' } }, required: ['filePath'] } },
  • src/index.ts:1269-1271 (registration)
    Tool registration in the main switch dispatcher: maps tool name 'find_duplicates' to ExcelWorkflowHandler.findDuplicates method call.
    case 'find_duplicates': return await this.excelWorkflowHandler.findDuplicates(toolArgs); case 'data_cleaner':
  • src/index.ts:58-58 (registration)
    Instantiation of the ExcelWorkflowHandler class that contains the findDuplicates method.
    this.excelWorkflowHandler = new ExcelWorkflowHandler();
  • Fallback parser in NLP processor that recognizes 'duplicate' commands and maps to action 'find_duplicates'.
    } else if (lowerText.includes('duplicate')) { return { type: 'operation', action: 'find_duplicates', parameters: {}, confidence: 0.7 };

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/ishayoyo/excel-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server