Skip to main content
Glama
ishayoyo

Excel MCP Server

by ishayoyo

find_duplicates

Identify and handle duplicate rows in Excel/CSV files using configurable strategies including highlighting, removal, or exporting duplicates.

Instructions

Find and manage duplicate rows in Excel/CSV files with multiple strategies

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
filePathYesPath to the CSV or Excel file
columnsNoColumns to check for duplicates (empty = all columns)
actionNoWhat to do with duplicates (default: report_only)
keepFirstNoKeep first occurrence when removing (default: true)
sheetNoSheet name for Excel files (optional)

Implementation Reference

  • The core handler function that implements the find_duplicates tool. Reads the file, identifies duplicate rows based on specified columns (or all), groups them, and either reports details or removes duplicates keeping first/last occurrence.
    async findDuplicates(args: ToolArgs): Promise<ToolResponse> {
      try {
        const { filePath, columns = [], action = 'report_only', keepFirst = true, sheet } = args;
    
        if (!filePath) {
          return {
            content: [{
              type: 'text',
              text: JSON.stringify({
                success: false,
                error: 'Missing required parameter: filePath'
              }, null, 2)
            }]
          };
        }
    
        // Read the file
        const data = await readFileContent(filePath, sheet);
    
        if (data.length === 0) {
          return {
            content: [{
              type: 'text',
              text: JSON.stringify({
                success: false,
                error: 'File is empty or could not be read'
              }, null, 2)
            }]
          };
        }
    
        const headers = data[0];
        const rows = data.slice(1);
    
        // Determine which columns to check for duplicates
        let checkColumns: number[] = [];
        if (columns.length === 0) {
          // Check all columns
          checkColumns = Array.from({length: headers.length}, (_, i) => i);
        } else {
          // Convert column names/indices to indices
          checkColumns = columns.map((col: any) => {
            if (typeof col === 'number') return col;
            const index = headers.indexOf(col);
            if (index === -1) throw new Error(`Column "${col}" not found`);
            return index;
          });
        }
    
        // Find duplicates
        const duplicateGroups = new Map<string, number[]>();
        const uniqueRows: any[][] = [];
        const duplicateIndices = new Set<number>();
    
        rows.forEach((row: any[], index: number) => {
          const key = checkColumns.map(colIndex => String(row[colIndex] || '')).join('|||');
    
          if (!duplicateGroups.has(key)) {
            duplicateGroups.set(key, []);
          }
          duplicateGroups.get(key)!.push(index);
        });
    
        // Identify actual duplicates (groups with more than 1 row)
        const actualDuplicates = Array.from(duplicateGroups.entries())
          .filter(([_, indices]) => indices.length > 1);
    
        let resultData = data;
        let removedCount = 0;
    
        if (action === 'remove') {
          // Keep headers
          const cleanedData = [headers];
    
          for (const [_, indices] of duplicateGroups.entries()) {
            if (indices.length === 1) {
              // Not a duplicate, keep it
              cleanedData.push(rows[indices[0]]);
            } else {
              // Duplicate group - keep first or last based on keepFirst
              const keepIndex = keepFirst ? indices[0] : indices[indices.length - 1];
              cleanedData.push(rows[keepIndex]);
              removedCount += indices.length - 1;
            }
          }
    
          resultData = cleanedData;
    
          // Save the cleaned file back
          // This would need file writing logic similar to your existing handlers
        }
    
        const result = {
          success: true,
          operation: 'find_duplicates',
          summary: {
            totalRows: rows.length,
            duplicateGroups: actualDuplicates.length,
            totalDuplicates: actualDuplicates.reduce((sum, [_, indices]) => sum + indices.length - 1, 0),
            removedRows: removedCount,
            resultRows: resultData.length - 1 // excluding header
          },
          duplicates: action === 'report_only' ? actualDuplicates.map(([key, indices]) => ({
            key: key.split('|||'),
            rowIndices: indices.map(i => i + 2), // +2 for header and 1-based indexing
            count: indices.length
          })) : undefined,
          action,
          keepFirst
        };
    
        return {
          content: [{
            type: 'text',
            text: JSON.stringify(result, null, 2)
          }]
        };
    
      } catch (error) {
        return {
          content: [{
            type: 'text',
            text: JSON.stringify({
              success: false,
              error: error instanceof Error ? error.message : 'Unknown error',
              operation: 'find_duplicates'
            }, null, 2)
          }]
        };
      }
    }
  • The MCP tool schema definition for 'find_duplicates', including input parameters, types, descriptions, and required fields.
      name: 'find_duplicates',
      description: 'Find and manage duplicate rows in Excel/CSV files with multiple strategies',
      inputSchema: {
        type: 'object',
        properties: {
          filePath: {
            type: 'string',
            description: 'Path to the CSV or Excel file'
          },
          columns: {
            type: 'array',
            items: { type: 'string' },
            description: 'Columns to check for duplicates (empty = all columns)'
          },
          action: {
            type: 'string',
            enum: ['highlight', 'remove', 'export_duplicates', 'report_only'],
            description: 'What to do with duplicates (default: report_only)'
          },
          keepFirst: {
            type: 'boolean',
            description: 'Keep first occurrence when removing (default: true)'
          },
          sheet: {
            type: 'string',
            description: 'Sheet name for Excel files (optional)'
          }
        },
        required: ['filePath']
      }
    },
  • src/index.ts:1269-1271 (registration)
    Tool registration in the main switch dispatcher: maps tool name 'find_duplicates' to ExcelWorkflowHandler.findDuplicates method call.
    case 'find_duplicates':
      return await this.excelWorkflowHandler.findDuplicates(toolArgs);
    case 'data_cleaner':
  • src/index.ts:58-58 (registration)
    Instantiation of the ExcelWorkflowHandler class that contains the findDuplicates method.
    this.excelWorkflowHandler = new ExcelWorkflowHandler();
  • Fallback parser in NLP processor that recognizes 'duplicate' commands and maps to action 'find_duplicates'.
    } else if (lowerText.includes('duplicate')) {
      return {
        type: 'operation',
        action: 'find_duplicates',
        parameters: {},
        confidence: 0.7
      };

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/ishayoyo/excel-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server