xlsx_data_clean
Identify and fix common data quality issues in Excel files (NA variants, merged cells, type errors, trailing noise, etc.) using diagnose or execute modes.
Instructions
AI-native data cleaning for a LOCAL .xlsx file. Scans for the seven most common data-grime issues — NA variants (N/A, NA, null, -), merged-cell residue, type-coercion mistakes (numeric-as-text / date-as-serial / leading-zero stripped), trailing-row noise (footers / totals), header-row-not-first (preamble before headers), encoding glitches (UTF-8-as-CP1252 mojibake), and duplicate column headers — and either flags them (diagnose mode) or applies deterministic fixes (execute mode).
Informer-not-enforcer: every fix surfaces as a Finding the caller can accept / reject / scope-override before the file is mutated.
USE WHEN: an upstream pipeline produced a messy xlsx that's about to feed an LLM or downstream analysis and you want a one-pass scrub.
DO NOT USE WHEN: domain-specific transforms are needed (use a dedicated pipeline). Or for structural integrity checks (use xlsx_doctor). Or for upload/attached files.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| accept_findings | No | ||
| detectors | No | ||
| file_b64 | Yes | ||
| mode | No | ||
| options | No | ||
| overrides | No | ||
| reject_findings | No | ||
| sheets | No |