data_cleaner
Clean and standardize Excel/CSV data by removing whitespace, fixing dates, standardizing formats, and eliminating empty rows to improve data quality.
Instructions
Batch data cleaning operations with intelligent detection of common data quality issues
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| filePath | Yes | Path to the CSV or Excel file | |
| operations | No | Array of cleaning operations to apply | |
| preview | No | Show preview before applying changes (default: false) | |
| sheet | No | Sheet name for Excel files (optional) |
Implementation Reference
- src/index.ts:1271-1272 (registration)Registration of the data_cleaner tool in the main CallToolRequestHandler switch statement, dispatching to ExcelWorkflowHandler.cleanData method.case 'data_cleaner': return await this.excelWorkflowHandler.cleanData(toolArgs);
- src/index.ts:1113-1145 (schema)Input schema for the data_cleaner tool, defining required filePath and optional operations array (with enum of cleaning types), preview flag, and sheet.name: 'data_cleaner', description: 'Batch data cleaning operations with intelligent detection of common data quality issues', inputSchema: { type: 'object', properties: { filePath: { type: 'string', description: 'Path to the CSV or Excel file' }, operations: { type: 'array', items: { type: 'string', enum: [ 'trim_whitespace', 'fix_dates', 'standardize_numbers', 'remove_empty_rows', 'standardize_phone_formats', 'standardize_names', 'remove_special_chars', 'fix_currency' ] }, description: 'Array of cleaning operations to apply' }, preview: { type: 'boolean', description: 'Show preview before applying changes (default: false)' }, sheet: { type: 'string', description: 'Sheet name for Excel files (optional)' } }, required: ['filePath'] } },
- src/handlers/excel-workflow.ts:147-242 (handler)Core handler function for data_cleaner tool. Parses args, reads file, applies specified cleaning operations using private helper methods, tracks changes, and returns JSON summary with preview option and change log.async cleanData(args: ToolArgs): Promise<ToolResponse> { try { const { filePath, operations = [], preview = false, sheet } = args; if (!filePath) { return { content: [{ type: 'text', text: JSON.stringify({ success: false, error: 'Missing required parameter: filePath' }, null, 2) }] }; } // Read the file const data = await readFileContent(filePath, sheet); if (data.length === 0) { return { content: [{ type: 'text', text: JSON.stringify({ success: false, error: 'File is empty or could not be read' }, null, 2) }] }; } const headers = data[0]; let workingData = data.map(row => [...row]); // Deep copy const changes: any[] = []; // Apply each cleaning operation for (const operation of operations) { switch (operation) { case 'trim_whitespace': workingData = this.trimWhitespace(workingData, changes); break; case 'remove_empty_rows': workingData = this.removeEmptyRows(workingData, changes); break; case 'standardize_phone_formats': workingData = this.standardizePhoneFormats(workingData, changes); break; case 'fix_dates': workingData = this.fixDates(workingData, changes); break; case 'standardize_names': workingData = this.standardizeNames(workingData, changes); break; case 'remove_special_chars': workingData = this.removeSpecialChars(workingData, changes); break; case 'fix_currency': workingData = this.fixCurrency(workingData, changes); break; } } const result = { success: true, operation: 'data_cleaner', preview, summary: { originalRows: data.length, cleanedRows: workingData.length, operationsApplied: operations, changesCount: changes.length }, changes: changes.slice(0, 100), // Limit to first 100 changes for readability data: preview ? workingData.slice(0, 10) : undefined // Show first 10 rows in preview }; return { content: [{ type: 'text', text: JSON.stringify(result, null, 2) }] }; } catch (error) { return { content: [{ type: 'text', text: JSON.stringify({ success: false, error: error instanceof Error ? error.message : 'Unknown error', operation: 'data_cleaner' }, null, 2) }] }; } }
- Helper method for 'trim_whitespace' operation: trims whitespace from string cells and logs changes.private trimWhitespace(data: any[][], changes: any[]): any[][] { return data.map((row, rowIndex) => row.map((cell, colIndex) => { if (typeof cell === 'string') { const trimmed = cell.trim(); if (trimmed !== cell) { changes.push({ operation: 'trim_whitespace', row: rowIndex, col: colIndex, before: cell, after: trimmed }); } return trimmed; } return cell; }) ); }
- Helper method for 'remove_empty_rows' operation: filters out completely empty rows (excluding header) and logs removals.private removeEmptyRows(data: any[][], changes: any[]): any[][] { const result = data.filter((row, index) => { const isEmpty = row.every(cell => !cell || String(cell).trim() === ''); if (isEmpty && index > 0) { // Don't remove header changes.push({ operation: 'remove_empty_rows', row: index, action: 'removed' }); return false; } return true; }); return result; }